CN109388676B - Data synchronization generation method, device, computer equipment and storage medium - Google Patents

Data synchronization generation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109388676B
CN109388676B CN201810954798.1A CN201810954798A CN109388676B CN 109388676 B CN109388676 B CN 109388676B CN 201810954798 A CN201810954798 A CN 201810954798A CN 109388676 B CN109388676 B CN 109388676B
Authority
CN
China
Prior art keywords
data
relational database
script
task
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810954798.1A
Other languages
Chinese (zh)
Other versions
CN109388676A (en
Inventor
席旭亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810954798.1A priority Critical patent/CN109388676B/en
Publication of CN109388676A publication Critical patent/CN109388676A/en
Application granted granted Critical
Publication of CN109388676B publication Critical patent/CN109388676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data resource, and discloses a data synchronization generation method, a device, computer equipment and a storage medium, wherein the data synchronization generation method comprises the following steps: reading a table structure of a pre-configured configuration table for synchronizing data of the big data platform to a relational database on the big data platform to obtain each configuration information of the configuration table; creating a view file corresponding to the relational database according to the table name carried in the table structure; authorizing the view file to generate an authorization file; generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file; and respectively transmitting the scheduling task, the script and the list establishing sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database. An automatic Sqoop data synchronization generating tool is developed, a script file and a table building statement of Sqoop data synchronization can be automatically generated, and the script file is controlled to be synchronized to realize data synchronization generation.

Description

Data synchronization generation method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of big data, and in particular, to a method and apparatus for generating data synchronously, a computer device, and a storage medium.
Background
The data in the existing relational database can be applied to different application fields only by synchronizing the data in the large data platform, and the data volume of the large data platform can be continuously expanded so as to share the data in the specific relational database to other relational databases. The authority of the existing relational databases is strict, other relational databases cannot directly acquire corresponding information from the appointed relational databases, and because the data types of the relational databases are not compatible, the other relational databases cannot directly identify the data in the appointed relational databases, the relational databases need to be synchronized to a large data platform and then shared, but at present, the project does not have a Sqoop (a tool for data transmission between Hadoop (Hive) and the traditional databases) data synchronization automatic generation tool, manual participation is needed, code errors are easy to occur, the workload is large, the time required for product online is long, and the working efficiency is low.
Disclosure of Invention
The application mainly aims to provide a data synchronization generation method for synchronizing data of a large data platform to a relational database, and aims to solve the technical problems that manual participation is needed when the data of the large data platform is synchronized to the relational database and the working efficiency is low.
The application provides a data synchronization generation method, which synchronizes data of a big data platform to a relational database, and comprises the following steps:
reading a table structure of a pre-configured configuration table for synchronizing data of the big data platform to a relational database on the big data platform to obtain each configuration information of the configuration table;
creating a view file corresponding to the relational database according to the table name carried in the table structure;
authorizing the view file to generate an authorization file;
generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file;
and respectively transmitting the scheduling task, the script and the list establishing sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
Preferably, before the step of reading the table structure of the pre-configured configuration table for synchronizing the data of the big data platform to the relational database, the method includes:
Receiving the fields corresponding to the configuration information respectively, wherein the configuration information is configured into a configuration table according to a task protocol;
detecting whether each field accords with a preset configuration rule;
if yes, generating an instruction for reading the table structure of the configuration table.
Preferably, the step of detecting whether each field meets a preset configuration rule includes:
acquiring field contents corresponding to the fields respectively;
judging whether the writing mode of each field content is matched with each writing rule in the preset configuration rules one by one;
if yes, judging that each field of the configuration table accords with a preset configuration rule, otherwise, judging that each field of the configuration table does not accord with the preset configuration rule.
Preferably, the step of generating the scheduling task, the script and the table-building statement corresponding to the relational database from the view file according to the authorization file includes:
generating a data table of a relational database corresponding to the big data platform from the view file according to the authorization file;
constructing corresponding scripts and table construction sentences according to the drawing number modes carried by the data table;
and automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to a relational database.
Preferably, the step of constructing corresponding scripts and table-building sentences according to the decimation pattern of the data table includes:
judging whether the decimation mode is full decimation;
if yes, extracting all table data corresponding to the table names carried in the table structure to the data table;
and forming the data table into a first script and a first table establishing statement corresponding to the full data.
Preferably, the step of constructing corresponding scripts and table-building sentences according to the decimation pattern of the data table includes:
judging whether the decimation pattern is incremental decimation or not;
if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure;
merging the initialization table and the increment table into the data table;
and forming the data table into a second script and a second table-building statement corresponding to the incremental data.
Preferably, the step of automatically generating the corresponding scheduling task of scheduling the specified data of the big data platform to the relational database according to the script and the list-establishing sentence comprises the following steps:
judging whether the dependency relationship of the acquired initialization table task, the acquired increment table task and the task combining the initialization table and the increment table is correct or not;
If yes, automatically generating a scheduling task according to the second script and the second list building statement.
The application also provides a data synchronous generation device, which comprises:
the reading module is used for reading a table structure of a preset configuration table for synchronizing the data of the big data platform to the relational database in the big data platform so as to acquire each configuration information of the configuration table;
the creation module is used for creating a view file corresponding to the relational database according to the table name carried in the table structure;
the authorization module is used for authorizing the view file to generate an authorization file;
the generating module is used for generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file;
and the transmission module is used for respectively transmitting the scheduling task, the script and the list establishment sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
The application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
The application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
According to the application, by developing the Sqoop data synchronous automatic generation tool, the Sqoop data synchronous script file and the list construction statement can be automatically generated, the Sqoop data synchronous script file and the list construction statement can be directly deployed (hung on another platform), the platform is used for controlling the script file to be synchronous according to days, weeks, months, seasons, years and the like, the daily point execution can be set, whether the execution is needed or not and the like can be set, the script file content is standardized, the script log is convenient to check, the script log is trackable, the script operation platform is unified and centralized, and one operator can manage all script operation conditions conveniently, and compared with the method for directly transmitting the data to a designated platform, the method is more convenient to manage and control. The method improves the flow and accuracy of Sqoop data synchronization, greatly shortens the time consumption of product online, and remarkably improves the working efficiency. By setting an automatic checking process, the accuracy of the Sqoop data synchronous automatic generation tool is improved, and the flow is fluent. Corresponding scripts and table building sentences are generated by automatically selecting Sqoop data synchronously in different extraction modes, so that different task demands are met, and the application field of the Sqoop data synchronous automatic generation tool is enlarged.
Drawings
FIG. 1 is a flow chart of a method for generating data synchronization according to an embodiment of the application;
FIG. 2 is a schematic diagram of a data synchronization generating device according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a data synchronization generating device according to another embodiment of the present application;
FIG. 4 is a schematic structural diagram of a monitoring module according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a first generating module according to an embodiment of the application;
FIG. 6 is a schematic diagram of a second building block according to an embodiment of the application;
FIG. 7 is a schematic diagram of the structure of a second building block according to another embodiment of the present application;
FIG. 8 is a schematic diagram of a generating unit according to another embodiment of the present application;
FIG. 9 is a schematic diagram showing an internal structure of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, a data synchronization generating method for synchronizing data of a big data platform to a relational database according to an embodiment of the present application includes:
S1: and reading a table structure of a pre-configured configuration table for synchronizing the data of the big data platform to the relational database in the big data platform so as to acquire each configuration information of the configuration table.
The relational database of the present embodiment is a database based on a relational database model, in which data is processed by means of concepts and methods such as set algebra, and is also organized into a set of formally descriptive tables that act essentially as special collections loaded with data items, the data in which can be accessed or recalled in many different ways without the need to reorganize the database tables, each containing one or more data types represented by columns. The relational database of the present embodiment includes Oracle, db2, sqlserver, sybase, mysql, pg, and the like. The table structure of the present embodiment includes a table name, a library name, which fields are included in the table, description information of each field, and the like.
For example, the table structure of the present embodiment has the following parameters:
whether or not to process Indicating whether or not the record is processed
User' s Finger hadoop users, e.g. hduer 0101
Store name Finger libraries, e.g. sx_360_safe
Source list Filling out the table name of relational database
Main key Main key for filling out corresponding table
Updating fields Filling in update fields of a correspondence table
Database type Filling oracle, postgresql, mysql 3, 1
Whether or not to maintain a table Filling Y/N
Source library instance name Only in the provided list
Drawing number mode Full or incremental amounts
Partition field Optionally, if there is a fill, then the results table will have partitions
Partition field value Optionally, if there is a fill, then the results table will have partitions
Dynamic partition field value Optionally, if there is a fill, then the results table will have partitions
Table description hive Table description
Deployment catalogs Pressing to realityInternally filled, e.g. can write hive_app for monetary value home
Version of Filling version number
S2: and creating a view file corresponding to the relational database according to the table name carried in the table structure.
According to the embodiment, based on the problem of controlling the data use permission of the relational database, the synchronous data table of the large data platform is required to generate the view file of the corresponding relational database so as to control the data use permission of the relational database.
S3: and authorizing the view file to generate an authorization file.
Through the authorization file, the user can obtain the authority of browsing the view file corresponding to the relational database so as to further manage the data security of the relational database.
S4: and generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file.
In the embodiment, the Sqoop data from the big data platform to the relational database is automatically generated by taking an example, the corresponding content of the view file is obtained according to the authorization file, the configuration information contained in the view file can be obtained, and then the standardized Sqoop synchronization script and the table construction statement corresponding to the relational database are automatically generated by calling a program.
S5: and respectively transmitting the scheduling task, the script and the list establishing sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
In this embodiment, the scheduling task, the script and the table-building sentence are respectively transmitted to the designated position of the relational database through the deployment document, and the deployment document in this embodiment is used for production deployment, and the generating process logic is automatically generated after being processed by a program according to the configuration information customized by the user.
Further, before step S1 of the present embodiment, the method includes:
s10: and receiving the fields corresponding to the configuration information respectively, wherein the configuration information is configured into a configuration table according to a task protocol.
The task protocol in this embodiment is a protocol negotiated by both task parties, and the task protocol includes content such as configuration rules, so that both task parties cooperate to complete corresponding tasks.
S11: and detecting whether each field accords with a preset configuration rule.
The configuration rules of the present embodiment are specific to the matching rules of each column of fields in the configuration table, for example, the beginning of a field is to start with a letter, end with a number, etc.
S12: if yes, generating an instruction for reading the table structure of the configuration table.
After the writing mode of each field is detected to be in accordance with the configuration rule of the task protocol, the table structure of the configuration table of the data to be synchronized is read in the large data platform, so that the error probability of reading the table structure is reduced, the accuracy of reading information is improved, and the efficiency of synchronous and automatic generation of Sqoop data is improved.
Further, step S11 of the present embodiment includes:
s111: and acquiring field contents corresponding to the fields respectively.
In the configuration table of the present embodiment, the fields of each column represent different contents, such as table names, library names, tasks, task dependencies, and the like.
S112: judging whether the writing mode of each field content is matched with each writing rule in the preset configuration rules one by one.
The matching rules corresponding to each field in this embodiment are different, for example, the task name is preferably not chinese, the task dependency relationship is represented by english comma, for example, the task name is listed as 'a', the task dependency relationship is listed as 'E, F', and then it is indicated that task a depends on task E, F, and it is indicated that task a can only be performed if task E, F is a completed valid task.
S113: if yes, judging that each field of the configuration table accords with a preset configuration rule, otherwise, judging that each field of the configuration table does not accord with the preset configuration rule.
The character segment in this embodiment is correctly identified only after conforming to the corresponding matching rule, otherwise, the error is reported, and the subsequent flow cannot be continued after the error is reported, which affects the process of automatically generating the corresponding script.
Further, step S4 of the present embodiment includes:
s40: and generating a data table of a relational database corresponding to the big data platform from the view file according to the authorization file.
In the embodiment, the identifiable data table in the relational database is built on the large data platform so as to synchronize the Sqoop data into the data table and then send the Sqoop data to the relational database. The data structure of the big data platform is character type, so as to improve the accommodation and compatibility of data in various fields. However, the data type of the relational database is relatively single, such as a character string type or a digital type, and the data of the character type is directly transmitted to the relational database without being identified, so that the data table of the relational database is constructed in a large data platform. And converting the characters corresponding to the synchronous data of the big data platform into corresponding character strings or numbers according to the specific form of each field in the relational data.
S41: and constructing corresponding scripts and table building sentences according to the drawing number modes carried by the data table.
The decimation pattern of this embodiment represents a canonical way of constructing a data table of a relational database, including a full decimation way and an incremental decimation way. The increment decimation only extracts the newly added or modified data in the table corresponding to the table name of the synchronous data in the large data platform, the application range is wider, and the data extraction is more timely. The script automatic generation mechanisms corresponding to the full-scale decimation mode and the increment decimation mode in this embodiment are different, and the table-building sentences are also different.
S42: and automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to the relational database.
The script automatic generation mechanisms respectively corresponding to the full-scale decimation mode and the increment decimation mode in the embodiment are different, the table building sentences are also different, and the scheduling tasks respectively corresponding to the full-scale decimation mode and the increment decimation mode in the embodiment are also different.
Further, step S41 of the present embodiment includes:
s410: and judging whether the decimation mode is full decimation.
In this embodiment, according to the data coverage to be extracted being the full coverage of the table data in a certain table name, the total extraction is performed, and only one corresponding extraction task is provided.
S411: if yes, all table data corresponding to the table names carried in the table structure are extracted to the data table.
In the full extraction mode of this embodiment, all data in the table data corresponding to the table name is extracted once by executing the corresponding decimation task once, and the corresponding Sqoop data script and the corresponding table construction statement corresponding to the full extraction are generated.
S412: and forming the data table into a first script and a first table establishing statement corresponding to the full data.
The Sqoop data script and the table-building sentence in the full-scale extraction mode in this embodiment are correspondingly a first script and a first table-building sentence, so as to be different from the script and the table-building sentence in the incremental extraction mode. The "first" is merely a distinction, and is not limited, and the "first" and "second" in other parts of the present application have the same effect, and are not repeated.
Further, step S41 of another embodiment of the present application includes:
s413: and judging whether the decimation mode is incremental decimation.
In this embodiment, the coverage area of the data to be extracted is a part of the table data in a table name, and the data to be extracted is judged to be the newly added or newly modified data content by the timestamp formed by the data or the operation log, and then the data is extracted in an increment.
S414: if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure.
The incremental extraction of this embodiment includes two tasks, one of which extracts the original basic data to form an initialization table, and the other of which extracts the newly added or newly modified data content to form an increment table, and increases the application range of the data according to distinguishing the initialization table and the increment table.
S415: and merging the initialization table and the increment table into the data table.
In this embodiment, in the incremental extraction manner, after two tasks of the initialization table and the incremental table are generated, a third task is further included, and the initialization table and the incremental table are combined to generate a data table of a character string or a number identifiable by the corresponding relational database.
S416, forming the data table into a second script and a second table-building statement corresponding to the incremental data.
The incremental extraction mode of the Sqoop data in the embodiment is different from the full extraction mode in task flow details, so that the second script and the second table construction statement of the incremental data are essentially different from the first script and the first table construction statement corresponding to the Sqoop data in the full extraction mode.
Further, step S42 of another embodiment of the present application includes:
s421: judging whether the dependency relationship of the acquired initialization table task, the acquired increment table task and the task combining the initialization table and the increment table is correct or not.
In the incremental extraction of the embodiment, three tasks related to time sequence exist, and a scheduling platform is required to automatically identify whether the dependency relationship of scheduling the three tasks is correct or not; whether the three tasks are repeated with a certain task in the task bar; or whether the field of the task is legal, etc. In this embodiment, it is mainly determined whether the dependency relationship or the time sequence of the three tasks meets the requirements.
S422: if yes, automatically generating a scheduling task according to the second script and the second list building statement.
According to the embodiment, by judging that the tasks of acquiring the initialization table task and the increment table task are legal, and the dependency relationship of the tasks of acquiring the initialization table task, the increment table task and the task of combining the initialization table and the increment table is correct, the standardized Sqoop data synchronization script is automatically generated.
According to the embodiment, the Sqoop data synchronous automatic generation tool is developed, the Sqoop data synchronous script file and the list construction statement can be automatically generated, the Sqoop data synchronous script file and the list construction statement can be directly deployed (hung on another platform), the platform is used for controlling the script file to be synchronous according to days, weeks, months, seasons, years and the like, the daily point execution can be set, whether the execution is needed or not and the like can be set, script file content is normalized, the script log is convenient to check, the script log is trackable, the script operation platform is unified and centralized, and an operator can conveniently manage all script operation conditions, and compared with the method for directly transmitting data to a designated platform, the method is more convenient to manage and control. The method improves the flow and accuracy of Sqoop data synchronization, greatly shortens the time consumption of product online, and remarkably improves the working efficiency. By setting an automatic checking process, the accuracy of the Sqoop data synchronous automatic generation tool is improved, and the flow is fluent. Corresponding scripts and table building sentences are generated by automatically selecting Sqoop data synchronously in different extraction modes, so that different task demands are met, and the application field of the Sqoop data synchronous automatic generation tool is enlarged.
Referring to fig. 2, a data synchronization generating apparatus for synchronizing data of a large data platform to a relational database according to an embodiment of the present application includes:
the reading module 1 is used for reading the table structure of the pre-configured configuration table for synchronizing the data of the big data platform to the relational database in the big data platform so as to acquire each configuration information of the configuration table.
The relational database of the present embodiment is a database based on a relational database model, in which data is processed by means of concepts and methods such as set algebra, and is also organized into a set of formally descriptive tables that act essentially as special collections loaded with data items, the data in which can be accessed or recalled in many different ways without the need to reorganize the database tables, each containing one or more data types represented by columns. The relational database of the present embodiment includes Oracle, db2, sqlserver, sybase, mysql, pg, and the like. The table structure of the present embodiment includes a table name, a library name, which fields are included in the table, description information of each field, and the like.
For example, the table structure of the present embodiment has the following parameters:
Whether or not to process Indicating whether or not the record is processed
User' s Finger hadoop users, e.g. hduer 0101
Store name Finger libraries, e.g. sx_360_safe
Source list Filling out the table name of relational database
Main key Main key for filling out corresponding table
Updating fields Filling in update fields of a correspondence table
Database type Filling oracle, postgresql, mysql 3, 1
Whether or not to maintain a table Filling Y/N
Source library instance name Only in the provided list
Drawing number mode Full or incremental amounts
Partition field Optionally, if there is a fill, then the results table will have partitions
Partition field value Optionally, if there is a fill, then the results table will have partitions
Dynamic partition field value Optionally, if there is a fill, then the results table will have partitions
Table description hive Table description
Deployment catalogs Written hive_app as actually filled, e.g. for monetary value home
Version of Filling version number
And the creation module 2 is used for creating the view file corresponding to the relational database according to the table name carried in the table structure.
According to the embodiment, based on the problem of controlling the data use permission of the relational database, the synchronous data table of the large data platform is required to generate the view file of the corresponding relational database so as to control the data use permission of the relational database.
And the authorization module 3 is used for authorizing the view file to generate an authorization file.
Through the authorization file, the user can obtain the authority of browsing the view file corresponding to the relational database so as to further manage the data security of the relational database.
The first generation module 4: and generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file.
In the embodiment, the Sqoop data from the big data platform to the relational database is automatically generated by taking an example, the corresponding content of the view file is obtained according to the authorization file, the configuration information contained in the view file can be obtained, and then the standardized Sqoop synchronization script and the table construction statement corresponding to the relational database are automatically generated by calling a program.
And the transmission module 5 is used for respectively transmitting the scheduling task, the script and the list establishment sentence to the appointed position of the big data platform so as to schedule the appointed data of the relational database to the big data platform.
In this embodiment, the scheduling task, the script and the table-building sentence are respectively transmitted to the designated position of the big data platform through the deployment document, the deployment document in this embodiment is used for production deployment, and the generating process logic is automatically generated after being processed by a program according to the configuration information defined by the user.
Referring to fig. 3, a data synchronization generating apparatus according to another embodiment of the present application includes:
and the receiving module 10 is configured to receive the fields corresponding to the configuration information, where the configuration information is configured into a configuration table according to a task protocol.
The task protocol in this embodiment is a protocol negotiated by both task parties, and the task protocol includes content such as configuration rules, so that both task parties cooperate to complete corresponding tasks.
The monitoring module 11 is configured to detect whether each of the fields meets a preset configuration rule.
The configuration rules of the present embodiment are specific to the matching rules of each column of fields in the configuration table, for example, the beginning of a field is to start with a letter, end with a number, etc.
The second generating module 12 is configured to generate an instruction for reading the table structure of the configuration table if the field meets a preset configuration rule.
After the writing mode of each field is detected to be in accordance with the configuration rule of the task protocol, the table structure of the configuration table is read in the relational database, so that the error probability of reading the table structure is reduced, the accuracy of reading information is improved, and the efficiency of synchronous and automatic generation of the Sqoop data is improved.
Referring to fig. 4, the monitoring module 11 of the present embodiment includes:
and an obtaining unit 111, configured to obtain field contents corresponding to the fields respectively.
In the configuration table of the present embodiment, the fields of each column each represent different contents, such as a table name, a library name, a task dependency relationship, and the like.
The judging unit 112 is configured to judge whether the writing manner of each field content is matched with each writing rule in the preset configuration rules in a one-to-one correspondence manner.
The matching rules corresponding to each field in this embodiment are also different, for example, the task name cannot be in chinese, the task dependency relationship is represented by english comma, for example, the task name is listed as 'a', the task dependency relationship is listed as 'E, F', and then it is indicated that task a depends on task E, F, and it is indicated that task a cannot be performed unless task E, F is a completed valid task.
And the judging unit 113 is configured to judge that each field of the configuration table accords with the preset configuration rule if the writing manner of the field content is matched with each writing rule in the preset configuration rule in a one-to-one correspondence manner, and if not, each field of the configuration table does not accord with the preset configuration rule.
The character segment in this embodiment is correctly identified only after conforming to the corresponding matching rule, otherwise, the error is reported, and the subsequent flow cannot be continued after the error is reported, which affects the process of automatically generating the corresponding script.
Referring to fig. 5, the first generating module 4 of the present embodiment includes:
the first construction unit 40 is configured to generate, according to the authorization file, a data table of a big data platform corresponding to the big data platform from the view file.
In the embodiment, the identifiable data table in the relational database is built on the large data platform so as to synchronize the Sqoop data into the data table and then send the Sqoop data to the relational database. The data structure of the big data platform is character type, so as to improve the accommodation and compatibility of data in various fields. However, the data type of the relational database is relatively single, such as a character string type or a digital type, and the data of the character type is directly transmitted to the relational database without being identified, so that the data table of the relational database is constructed in a large data platform. And converting the characters corresponding to the synchronous data of the big data platform into corresponding character strings or numbers according to the specific form of each field in the relational data.
A second construction unit 41, configured to construct a corresponding script and a table-building statement according to the decimation pattern of the data table.
The decimation pattern in this embodiment represents a standard manner of constructing a data table of a large data platform, including a full decimation manner and an incremental decimation manner. The increment decimation only extracts the newly added or modified data in the table corresponding to the table name in the relational database, so that the application range is wider, and the data extraction is more timely. The script automatic generation mechanisms corresponding to the full-scale decimation mode and the increment decimation mode in this embodiment are different, and the table-building sentences are also different.
And the generating unit 42 is used for automatically generating a corresponding scheduling task according to the script and the list establishment sentence, wherein the scheduling task is used for scheduling the specified data of the big data platform to the relational database.
The script automatic generation mechanisms respectively corresponding to the full-scale decimation mode and the increment decimation mode in the embodiment are different, the table building sentences are also different, and the scheduling tasks respectively corresponding to the full-scale decimation mode and the increment decimation mode in the embodiment are also different.
Referring to fig. 6, the second construction unit 41 of the present embodiment includes:
a first judging subunit 410, configured to judge whether the decimation pattern is full decimation.
In this embodiment, according to the data coverage to be extracted being the full coverage of the table data in a certain table name, the total extraction is performed, and only one corresponding extraction task is provided.
The first extraction subunit 411 is configured to extract all table data corresponding to the table names carried in the table structure to the data table if the total extraction is performed.
In the full extraction mode of this embodiment, all data in the table data corresponding to the table name is extracted once by executing the corresponding decimation task once, and the corresponding Sqoop data script and the corresponding table construction statement corresponding to the full extraction are generated.
A first forming subunit 412, configured to form the data table into a first script and a first table-building statement corresponding to the full data.
The Sqoop data script and the table-building sentence in the full-scale extraction mode in this embodiment are correspondingly a first script and a first table-building sentence, so as to be different from the script and the table-building sentence in the incremental extraction mode. The "first" is merely a distinction, and is not limited, and the "first" and "second" in other parts of the present application have the same effect, and are not repeated.
Referring to fig. 7, a second construction unit 41 of another embodiment of the present application includes:
a second judging subunit 413, configured to judge whether the decimation pattern is incremental decimation.
In this embodiment, the coverage area of the data to be extracted is a part of the table data in a table name, and the data to be extracted is judged to be the newly added or newly modified data content by the timestamp formed by the data or the operation log, and then the data is extracted in an increment.
And the second extraction subunit 414 is configured to extract the initialization table and the increment table corresponding to the table names carried in the table structure, respectively, if the increment is extracted.
The incremental extraction of this embodiment includes two tasks, one of which extracts the original basic data to form an initialization table, and the other of which extracts the newly added or newly modified data content to form an increment table, and increases the application range of the data according to distinguishing the initialization table and the increment table.
And a merging subunit 415, configured to merge the initialization table and the increment table into the data table.
In this embodiment, in the incremental extraction manner, after two tasks of the initialization table and the incremental table are generated, a third task is further included, and the initialization table and the incremental table are combined to generate a data table identifiable by the corresponding large data platform.
A second forming subunit 416, configured to form the data table into a second script and a second table-building statement corresponding to the incremental data.
The incremental extraction mode of the Sqoop data in the embodiment is different from the full extraction mode in task flow details, so that the second script and the second table construction statement of the incremental data are essentially different from the first script and the first table construction statement corresponding to the Sqoop data in the full extraction mode.
Referring to fig. 8, a generating unit 42 of another embodiment of the present application includes:
the third judging subunit 421 is configured to judge whether the dependency relationship between the acquired task of initializing the table, the task of acquiring the increment table, and the task of merging the initializing table and the increment table is correct.
In the incremental extraction of the embodiment, three tasks related to time sequence exist, and a scheduling platform is required to automatically identify whether the dependency relationship of scheduling the three tasks is correct or not; whether the three tasks are repeated with a certain task in the task bar; or whether the field of the task is legal, etc. In this embodiment, it is mainly determined whether the dependency relationship or the time sequence of the three tasks meets the requirements.
Generation subunit 422: and if the dependency relationship is correct, automatically generating a scheduling task according to the second script and the second list building statement.
According to the embodiment, by judging that the tasks of acquiring the initialization table task and the increment table task are legal, and the dependency relationship of the tasks of acquiring the initialization table task, the increment table task and the task of combining the initialization table and the increment table is correct, the standardized Sqoop data synchronization script is automatically generated.
Referring to fig. 9, a computer device is further provided in an embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store all the data required by the data synchronization generation process. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data synchronization generating method.
The processor executes the data synchronization generation method, which comprises the following steps: reading a table structure of a pre-configured configuration table for synchronizing data of the big data platform to a relational database on the big data platform to obtain each configuration information of the configuration table; creating a view file corresponding to the relational database according to the table name carried in the table structure; authorizing the view file to generate an authorization file; generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file; and respectively transmitting the scheduling task, the script and the list establishing sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
According to the computer equipment, the Sqoop data synchronous automatic generation tool is developed, so that the Sqoop data synchronous script file and the list-building statement can be automatically generated, the Sqoop data synchronous script file and the list-building statement can be directly deployed (hung on another platform), the script file is controlled to be synchronous according to days, weeks, months, seasons, years and the like through the platform, the execution of several points per day can be set, whether the execution is needed or not and the like can be set, the script file content is normalized and convenient to view, the script log is traceable, the script operation platform is unified and centralized, and an operator can conveniently manage all script operation conditions, and compared with the method that the data is directly transmitted to a designated platform, the script operation is more convenient to manage and control. The method improves the flow and accuracy of Sqoop data synchronization, greatly shortens the time consumption of product online, and remarkably improves the working efficiency. By setting an automatic checking process, the accuracy of the Sqoop data synchronous automatic generation tool is improved, and the flow is fluent. Corresponding scripts and table building sentences are generated by automatically selecting Sqoop data synchronously in different extraction modes, so that different task demands are met, and the application field of the Sqoop data synchronous automatic generation tool is enlarged.
In one embodiment, before the step of the large data platform reading the table structure of the configuration table of the configuration database, the processor includes: receiving the fields corresponding to the configuration information respectively, wherein the configuration information is configured into a configuration table according to a task protocol; detecting whether each field accords with a preset configuration rule; if yes, generating an instruction for reading the table structure of the configuration table.
In one embodiment, the step of detecting, by the processor, whether each of the fields meets a preset configuration rule includes: acquiring field contents corresponding to the fields respectively; judging whether the writing mode of each field content is matched with each writing rule in the preset configuration rules one by one; if yes, judging that each field of the configuration table accords with a preset configuration rule, otherwise, judging that each field of the configuration table does not accord with the preset configuration rule.
In one embodiment, the step of generating, by the processor, the scheduling task, the script, and the table-building statement corresponding to the relational database from the view file according to the authorization file includes: generating a data table of a relational database corresponding to the big data platform from the view file according to the authorization file; constructing corresponding scripts and table building sentences according to the drawing number mode of the data table; and automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to the relational database.
In one embodiment, the step of constructing, by the processor, a corresponding script and a table-building statement according to the decimation pattern of the data table includes: judging whether the decimation mode is full decimation; if yes, extracting all table data corresponding to the table names carried in the table structure to the data table; and forming the data table into a first script and a first table establishing statement corresponding to the full data.
In one embodiment, the step of constructing, by the processor, a corresponding script and a table-building statement according to the decimation pattern of the data table includes: judging whether the decimation pattern is incremental decimation or not; if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure; merging the initialization table and the increment table into the data table; and forming the data table into a second script and a second table-building statement corresponding to the incremental data.
In one embodiment, the step of automatically generating, by the processor, the scheduling task of the corresponding scheduling big data platform to the relational database according to the script and the list-establishing sentence includes: judging whether the dependency relationship of the acquired initialization table task, the acquired increment table task and the task combining the initialization table and the increment table is correct or not; if yes, automatically generating a scheduling task according to the second script and the second list building statement.
It will be appreciated by those skilled in the art that the architecture shown in fig. 9 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
An embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data synchronization generating method, including: reading a table structure of a pre-configured configuration table for synchronizing data of the big data platform to a relational database on the big data platform to obtain each configuration information of the configuration table; creating a view file corresponding to the relational database according to the table name carried in the table structure; authorizing the view file to generate an authorization file; generating scheduling tasks, scripts and table building sentences corresponding to the relational database according to the view file; and respectively transmitting the scheduling task, the script and the list establishing sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
The computer readable storage medium can automatically generate the script file and the list-building statement of Sqoop data synchronization through developing the Sqoop data synchronization automatic generation tool, directly perform deployment (hanging on another platform), perform synchronization according to days, weeks, months, seasons, years and the like of the script file through the platform, and can also set the execution of several points per day, whether execution is needed or not and the like, the script file content is standardized and convenient to check, the script log is traceable, the script operation platform is unified and centralized, and an operator can manage all script operation conditions conveniently. The method improves the flow and accuracy of Sqoop data synchronization, greatly shortens the time consumption of product online, and remarkably improves the working efficiency. By setting an automatic checking process, the accuracy of the Sqoop data synchronous automatic generation tool is improved, and the flow is fluent. Corresponding scripts and table building sentences are generated by automatically selecting Sqoop data synchronously in different extraction modes, so that different task demands are met, and the application field of the Sqoop data synchronous automatic generation tool is enlarged.
In one embodiment, before the step of reading the table structure of the configuration table of the large data platform from the pre-configured database data, the processor includes: receiving the fields corresponding to the configuration information respectively, wherein the configuration information is configured into a configuration table according to a task protocol; detecting whether each field accords with a preset configuration rule; if yes, generating an instruction for reading the table structure of the configuration table.
In one embodiment, the step of detecting, by the processor, whether each of the fields meets a preset configuration rule includes: acquiring field contents corresponding to the fields respectively; judging whether the writing mode of each field content is matched with each writing rule in the preset configuration rules one by one; if yes, judging that each field of the configuration table accords with a preset configuration rule, otherwise, judging that each field of the configuration table does not accord with the preset configuration rule.
In one embodiment, the step of generating, by the processor, the scheduling task, the script, and the table-building statement corresponding to the relational database from the view file according to the authorization file includes: generating a data table of a relational database corresponding to the big data platform from the view file according to the authorization file; constructing corresponding scripts and table building sentences according to the drawing number mode of the data table; and automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to the relational database.
In one embodiment, the step of constructing, by the processor, a corresponding script and a table-building statement according to the decimation pattern of the data table includes: judging whether the decimation mode is full decimation; if yes, extracting all table data corresponding to the table names carried in the table structure to the data table; and forming the data table into a first script and a first table establishing statement corresponding to the full data.
In one embodiment, the step of constructing, by the processor, a corresponding script and a table-building statement according to the decimation pattern of the data table includes: judging whether the decimation pattern is incremental decimation or not; if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure; merging the initialization table and the increment table into the data table; and forming the data table into a second script and a second table-building statement corresponding to the incremental data.
In one embodiment, the step of automatically generating, by the processor, the scheduling task of the corresponding scheduling big data platform to the relational database according to the script and the list-establishing sentence includes: judging whether the dependency relationship of the acquired initialization table task, the acquired increment table task and the task combining the initialization table and the increment table is correct or not; if yes, automatically generating a scheduling task according to the second script and the second list building statement.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the application.

Claims (7)

1. A method of data synchronization generation, wherein synchronizing data of a large data platform to a relational database, comprising:
reading a table structure of a pre-configured configuration table for synchronizing data of the big data platform to a relational database on the big data platform to obtain each configuration information of the configuration table;
Creating a view file corresponding to the relational database according to the table name carried in the table structure;
authorizing the view file to generate an authorization file;
generating a data table of a relational database corresponding to a big data platform from the view file according to an authorization file, wherein the data structure of the big data platform is character type;
constructing corresponding scripts and table building sentences according to the drawing number modes carried by the data table, wherein the drawing number modes comprise a full extraction mode and an increment extraction mode;
automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to a relational database;
transmitting the dispatching task, the script and the list establishment sentence to the appointed position of the relational database respectively so as to dispatch the appointed data of the big data platform to the relational database;
the step of constructing corresponding scripts and table construction sentences according to the drawing patterns carried by the data table comprises the following steps:
judging whether the decimation mode is the full decimation;
if yes, extracting all table data corresponding to the table names carried in the table structure to the data table;
Forming a first script and a first table establishing statement corresponding to the full data from the data table;
judging whether the decimation pattern is the increment decimation;
if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure;
merging the initialization table and the increment table into the data table;
and forming the data table into a second script and a second table-building statement corresponding to the incremental data.
2. The method for generating synchronization of data according to claim 1, wherein before the step of reading the table structure of the pre-configured configuration table for synchronizing the data of the large data platform to the relational database, the method comprises:
receiving the fields corresponding to the configuration information respectively, wherein the configuration information is configured into a configuration table according to a task protocol;
detecting whether each field accords with a preset configuration rule;
if yes, generating an instruction for reading the table structure of the configuration table.
3. The method for generating data synchronization according to claim 2, wherein the step of detecting whether each of the fields conforms to a preset configuration rule comprises:
acquiring field contents corresponding to the fields respectively;
Judging whether the writing mode of each field content is matched with each writing rule in the preset configuration rules one by one;
if yes, judging that each field of the configuration table accords with a preset configuration rule, otherwise, judging that each field of the configuration table does not accord with the preset configuration rule.
4. The method for generating data synchronization according to claim 1, wherein the step of automatically generating the scheduling task of the corresponding scheduling big data platform from the script and the list-creating sentence to the relational database comprises:
judging whether the dependency relationship of the acquired initialization table task, the acquired increment table task and the task combining the initialization table and the increment table is correct or not;
if yes, automatically generating a scheduling task according to the second script and the second list building statement.
5. A data synchronization generating apparatus, comprising:
the reading module is used for reading a table structure of a preset configuration table for synchronizing the data of the big data platform to the relational database in the big data platform so as to acquire each configuration information of the configuration table;
the creation module is used for creating a view file corresponding to the relational database according to the table name carried in the table structure;
The authorization module is used for authorizing the view file to generate an authorization file;
the generation module is used for generating a data table of a relational database corresponding to the big data platform from the view file according to the authorization file, wherein the data structure of the big data platform is character type;
judging whether the decimation mode is full decimation;
if yes, extracting all table data corresponding to the table names carried in the table structure to the data table;
forming a first script and a first table establishing statement corresponding to the full data from the data table;
judging whether the decimation pattern is incremental decimation or not;
if yes, respectively extracting an initialization table and an increment table corresponding to the table names carried in the table structure;
merging the initialization table and the increment table into the data table;
forming a second script and a second table building statement of corresponding incremental data from the data table;
automatically generating a corresponding scheduling task according to the script and the list building sentence, wherein the scheduling task is used for scheduling the appointed data of the big data platform to a relational database;
and the transmission module is used for respectively transmitting the scheduling task, the script and the list establishment sentence to the appointed position of the relational database so as to schedule the appointed data of the big data platform to the relational database.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
CN201810954798.1A 2018-08-21 2018-08-21 Data synchronization generation method, device, computer equipment and storage medium Active CN109388676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810954798.1A CN109388676B (en) 2018-08-21 2018-08-21 Data synchronization generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810954798.1A CN109388676B (en) 2018-08-21 2018-08-21 Data synchronization generation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109388676A CN109388676A (en) 2019-02-26
CN109388676B true CN109388676B (en) 2023-11-14

Family

ID=65418485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810954798.1A Active CN109388676B (en) 2018-08-21 2018-08-21 Data synchronization generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109388676B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977157A (en) * 2019-02-27 2019-07-05 深圳点猫科技有限公司 A kind of method and electronic equipment importing data to target directory based on data platform
CN110059134A (en) * 2019-03-18 2019-07-26 深圳市买买提信息科技有限公司 A kind of data are synchronized to method, relevant apparatus and the equipment of cloud platform
CN110083579A (en) * 2019-03-21 2019-08-02 深圳壹账通智能科技有限公司 Incremental data synchronous method, apparatus, computer equipment and computer storage medium
CN110263032B (en) * 2019-05-13 2023-08-29 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for comparing table structures in database
CN110851511A (en) * 2019-10-09 2020-02-28 上海易点时空网络有限公司 Data synchronization method and device
CN111459943A (en) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 Data processing method, device, system, equipment and storage medium
CN111582824B (en) * 2020-05-08 2023-03-24 北京青云科技股份有限公司 Cloud resource synchronization method, device, equipment and storage medium
CN111666324B (en) * 2020-05-18 2023-06-27 新浪技术(中国)有限公司 ETL scheduling method and device between relational databases

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149752A (en) * 2007-11-10 2008-03-26 邹昌陆 Transversely combined query computer system and method based on SQL
CN101183361A (en) * 2006-11-13 2008-05-21 中兴通讯股份有限公司 Method of relation data base applications automatic upgrade
CN102402559A (en) * 2010-09-16 2012-04-04 中兴通讯股份有限公司 Database upgrade script generating method and device
CN102483762A (en) * 2009-07-01 2012-05-30 汤姆森特许公司 Method for accessing files of a file system according to metadata and device implementing the method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183361A (en) * 2006-11-13 2008-05-21 中兴通讯股份有限公司 Method of relation data base applications automatic upgrade
CN101149752A (en) * 2007-11-10 2008-03-26 邹昌陆 Transversely combined query computer system and method based on SQL
CN102483762A (en) * 2009-07-01 2012-05-30 汤姆森特许公司 Method for accessing files of a file system according to metadata and device implementing the method
CN102402559A (en) * 2010-09-16 2012-04-04 中兴通讯股份有限公司 Database upgrade script generating method and device

Also Published As

Publication number Publication date
CN109388676A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109388676B (en) Data synchronization generation method, device, computer equipment and storage medium
CN106528165B (en) Code generating method and code generating system
CN108170809B (en) Table building script generation method, device, equipment and computer readable storage medium
CN109241184B (en) Data synchronization method, device, computer equipment and storage medium
US11269694B2 (en) Automated API code generation
AU2015218520B2 (en) Service extraction and application composition
CN110119393B (en) Code version management system and method
CN109614309B (en) Method, device, computer equipment and storage medium for comparing test results
US20060015839A1 (en) Development of software systems
CN109062925B (en) Method, device, computer equipment and storage medium for automatically generating insert sentences
CN107315764B (en) Method and system for updating non-relational database associated data
CN106991100B (en) Data import method and device
CN109614371B (en) Method, device, computer equipment and storage medium for storing information
CN109359157A (en) Data synchronize generation method, device, computer equipment and storage medium
CN102110102A (en) Data processing method and device, and file identifying method and tool
CN110688378A (en) Migration method and system for database storage process
CN111367547A (en) Automatic interface code synchronization method, device and storage medium
CN110334326A (en) A kind of method and system for identifying recipe file and being converted into XML file
CN107562429B (en) Android system static partitioning method based on compiling rules
CN113791768B (en) Code generation method and device, storage medium and terminal
CN111367890A (en) Data migration method and device, computer equipment and readable storage medium
US11741000B2 (en) Method and system for verifying resulting behavior of graph query language
CN103235757B (en) Several apparatus and method that input domain tested object is tested are made based on robotization
CN108416035B (en) Disconf-based unified management method for database mapping files
CN108664505B (en) Method and device for exporting database table structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant