CN114020840A - Data processing method, device, server, storage medium and product - Google Patents

Data processing method, device, server, storage medium and product Download PDF

Info

Publication number
CN114020840A
CN114020840A CN202111275986.XA CN202111275986A CN114020840A CN 114020840 A CN114020840 A CN 114020840A CN 202111275986 A CN202111275986 A CN 202111275986A CN 114020840 A CN114020840 A CN 114020840A
Authority
CN
China
Prior art keywords
data
target
file
data processing
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111275986.XA
Other languages
Chinese (zh)
Inventor
陈伟华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111275986.XA priority Critical patent/CN114020840A/en
Publication of CN114020840A publication Critical patent/CN114020840A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, a server, a storage medium and a computer program product, and relates to the field of big data. The method comprises the following steps: acquiring data field definition and configuration information of a data file to be processed; selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load data records in a data file to be processed into a temporary table of the target database; and synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information. The embodiment of the invention realizes the extensible and configurable loading mode of data record by configuring the script template and the data processing requirements, solves the problems of complex development work and low development efficiency caused by independent development of different processing requirements, and realizes the effect of shortening the time of development work.

Description

Data processing method, device, server, storage medium and product
Technical Field
Embodiments of the present invention relate to the field of big data, and in particular, to a data processing method, an apparatus, a server, a storage medium, and a computer program product.
Background
For a distributed system, each subsystem may generate data, and each subsystem may use data generated by the other subsystems. For example, there may be a need to synchronize basic data such as employee and customer information of an organization and business data such as loan and payment settlement between each component system in the financial field.
Currently, technicians design a data synchronization process for each data synchronization requirement and develop program codes based on the data synchronization process. Because the data synchronization requirements are various in reality, the data synchronization implementation mode can increase the complexity of development work, reduce the development efficiency and increase the time of the development work.
Disclosure of Invention
Embodiments of the present invention provide a data processing method, an apparatus, a server, a storage medium, and a computer program product, which can simplify complexity of development work, improve development efficiency, and shorten time of development work.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring data field definition and configuration information of a data file to be processed;
selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load data records in a data file to be processed into a temporary table of the target database;
and synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus, where the apparatus includes:
the information acquisition module is used for acquiring data field definition and configuration information of the data file to be processed;
the script generation module is used for selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load the data record in the data file to be processed into a temporary table of the target database;
and the data synchronization module is used for synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
a memory for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the data processing method as provided by any of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention provide a storage medium containing computer-executable instructions for performing a data processing method according to any of the embodiments of the present invention when executed by a computer processor.
In a fifth aspect, the embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the data processing method according to any embodiment of the present invention.
The embodiment of the invention provides a data processing method, a data processing device, a server, a storage medium and a computer program product, wherein after a data field definition corresponding to a data file to be processed is obtained, a script template is selected according to the type of a target database, a data processing script is generated based on the data field definition and the script template, and a data record in the data file to be processed is loaded to a temporary table of the target database through the data processing script; and synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information. The embodiment of the invention realizes the extensible and configurable loading mode of data record by configuring the script template and the data processing requirements, solves the problems of complex development work and low development efficiency caused by independent development of different processing requirements, and realizes the effect of shortening the time of development work.
Drawings
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2a is a flow chart of another data processing method according to an embodiment of the present invention;
FIG. 2b is a flow chart of an implementation method for synchronizing data between a target table and a temporary table;
FIG. 3 is a flow chart of another data processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a data processing method according to an embodiment of the present invention;
fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a server according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Financial systems often employ a distributed system architecture. Each system generates data for the respective system, and the data for one system may be used by other associated systems. For example, there may be a need to synchronize basic data such as employee and customer information of an organization and business data such as loan and payment settlement between each component system in the financial field.
The developing system and the system already on line may have the requirement of adding new synchronous data. Different data synchronization requirements may have different requirements. For example, the target database of data synchronization has oracle and mysql. Some preprocessing is needed before some synchronous data, and some simple processing is needed for some fields when some data are synchronous. Some systems want to synchronize data in the form of a replica table (the target table data structure is the same or similar to the table of the source system), and some want to synchronize the change data of the current day only, and do not need to synchronize all data.
For example, after receiving a data file sent by a source system, a developer analyzes a data field and a field type, writes a database warehousing program according to the database type used by a target system based on the analyzed data field and the analyzed data type, executes the database warehousing program, and loads the data file to a temporary table of a database of the target system. Then, writing a data synchronization program, and executing the data synchronization program to synchronize the data file in the temporary table to the target table. In summary, since the data synchronization requirements are various, if more development is performed for each data synchronization requirement, the complexity of the development work is greatly increased, and the development efficiency is reduced.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the embodiment is applicable to a case where data is synchronized between different subsystems of a distributed system, and the method may be executed by a data processing apparatus, which may be implemented by software and/or hardware and configured in a server. As shown in fig. 1, the method includes:
and step 110, acquiring data field definition and configuration information of the data file to be processed.
The data file to be processed is data to be synchronized, which is sent to the target system by the source system.
In a data synchronization scenario, the source system is the system that provides the data file, and the target system is the system that requests the synchronized data, the target system needing to use the data produced by the source system. For example, if system B requests data synchronization from system a, then system B is the target system and system a is the source system. If system B requests data synchronization from system A and system C requests data synchronization from system A, then system B is the target system and system A is the source system in a data synchronization operation between systems A-B. In a data synchronization operation between systems B-C, system B is the source system and system C is the target system. The data source system and the target system may belong to the same distributed system or may belong to different distributed systems. When the target system needs to use the data generated by the source system, a data synchronization request is sent to the source system. And the source system responds to the data synchronization request of the target system and sends the data file to be processed to the target system.
The data field definition is used for specifying the fields contained in the data file to be processed. Specifically, the data field definition may include a field length specification, a field separation specification, a field type specification, and the like. The data file may be parsed in the manner specified in the data field definitions to obtain the fields included in the file and the data type of each field, etc.
Optionally, in a scenario that data needs to be synchronized, the source system sends a data file to the target system, where the data file sent by the source system is a data file to be processed. In addition, the source system sends a data field definition file to the target system. The data field definition file records the data field definition of the data file to be processed.
Alternatively, technicians of the source system and the target system may communicate the data field definitions offline and save the data field definitions to the configuration information.
The configuration information includes data synchronization requirements for the data files to be processed. The technician may specify data synchronization requirements by way of configuration. The data synchronization requirements can also be modified by changing the configuration information. In particular, the synchronization requirements of most data files can be met by configuring several attribute information in the program code. For the case of special processing requirements of some fields, the special processing requirements can be met by configuring the optional attributes. The special processing requirements include requirements for preprocessing or post-processing some fields.
For systems that require synchronized data, a sender of the data (i.e., a source system) may send a data field definition file to a receiver of the data (i.e., a destination system) before synchronization of the data occurs. Or, the technical personnel of the sending party and the receiving party determine the definition of the data field in a way of offline communication, and the technical personnel of the receiving party update the configuration information of the database according to the definition of the data field.
The target system is provided with a script template, a data definition file processing interface, and the like. If new requirements exist, the new requirements can be supported through the interface in the implementation framework of the data processing method, and the logic of the whole framework does not need to be changed. For example, if there is a file loading requirement for a new database, the new processing logic may be extended through the reserved interface to support the file loading requirement for the new type of database. Specifically, the new script template can be called, the data definition file processing interface corresponding to the new type of database can be added, the attribute can be configured according to the requirements of the original database without rewriting the whole set of method, and the synchronization requirement of the new database data file can be met.
Exemplarily, after receiving a data file to be processed, judging whether a data definition file corresponding to the data file to be processed exists; if so, analyzing the data definition file through the data definition file processing interface to obtain the data field definition of the data file to be processed, and storing the data field definition and the file identification of the data file to be processed in an associated manner. The data file processing interface comprises a processing method of the data file, one type of data definition file corresponds to one processing method, and the processing methods of different types of data definition files can be expanded.
Illustratively, after receiving a data file to be processed, the target system obtains data field definition information of the data file. In addition, configuration information input by a user based on synchronization requirements of the data files is also acquired.
Specifically, after receiving the data file to be processed, if it is determined that the data definition file corresponding to the data file to be processed exists, the data definition file is analyzed through the data definition file processing interface to obtain the data field definition of the data file to be processed, and the data field definition is stored in a target database (the target database is a database of a target system) with the file identifier of the data file to be processed as an index. And inquiring the target database based on the file identification of the data file to be processed to obtain the data field definition of the data file to be processed.
Optionally, after receiving the data file to be processed, if it is determined that there is no data definition file corresponding to the data file to be processed, determining a file identifier of the data file to be processed, and querying database configuration information according to the file identifier (the database configuration information is database configuration information of the target system), to obtain a data field definition of the data file to be processed.
And 120, selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load the data record in the data file to be processed into a temporary table of the target database.
It should be noted that, the script templates of different types of databases are configured in advance, and if there is a new database requirement, the processing logic for the database requirement may be added through the interface provided by the technical framework of the present invention. The processing logic may include, among other things, a data definition file and a script template.
Since the generation logic of the loading scripts of different types of databases may be different, in the case where the script templates corresponding to the multiple types of databases are configured, the script templates need to be selected based on the types of the target databases. For example, oracle generates sqlldr script, mysql generates load data infile script. Other database file load script generation logic may be written herein to extend support for unused databases.
The temporary table is a data table in the target database for temporarily storing data records of the data file to be processed.
Illustratively, a script template corresponding to the target database type is selected from the script template library, and the target field in the data field definition is filled in the script template to generate the data processing script. And executing the data processing script to load the data records in the data file to be processed into a temporary table of the target database. Specifically, determining the type information of a field to be filled in a script template; and querying the data field definition according to the type information to determine a target field, and filling the target field into the script template to obtain the data processing script. And executing the data processing script to load the data file to be processed to the earth surface.
And step 130, synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information.
Wherein the target table is a data table in the target database for storing the synchronous data records. For example, the target system requests the source system for synchronized data, and the synchronized data records are saved to the target table according to different data processing requirements. It should be noted that, the user configures the configuration information from the data processing request to the database according to the processing request, and can determine the data processing request by querying the configuration information. The data processing requirements include full data synchronization, incremental data synchronization, or delete data synchronization, among others.
Illustratively, data processing requirements in the configuration information are obtained; and synchronizing the data records in the temporary table to the target table in the target database according to a mode specified by the data processing requirement.
According to the technical scheme of the embodiment, data field definition and configuration information of a data file to be processed are obtained; selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load data records in a data file to be processed into a temporary table of the target database; according to the data processing requirements in the configuration information, the data records in the temporary table are synchronized to the target table, the problems of complex development work and low development efficiency caused by independent development of different processing requirements are solved, and the effect of shortening the development work time is achieved.
Fig. 2a is a flowchart of another data processing method provided in an embodiment of the present invention, and further illustrates an implementation manner of synchronizing data records in a temporary table to a target table in a target database according to a manner specified by a data processing requirement on the basis of the above technical solution. As shown in fig. 2a, the method comprises:
step 210, obtaining data field definition and configuration information of the data file to be processed.
The configuration information is information which is flexibly configured by a user through a configuration interface in an implementation framework of the data processing method.
When data processing is needed, a user firstly configures the configuration information to the database by calling the configuration interface, so that when data synchronization is performed, corresponding operation can be performed according to the configuration information by self to meet the data processing requirement.
And step 220, selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load the data record in the data file to be processed into a temporary table of the target database.
Step 230, obtaining the data processing requirement in the configuration information.
Illustratively, the configuration information of the database is read, and the data processing requirement is obtained from the configuration information.
And step 240, when the data processing requirement is full data synchronization, clearing the data records in the target table, and inserting all the data records in the temporary table into the cleared target table.
Illustratively, if the data processing requirement is full data synchronization, all data records already in the destination table are emptied. Then, all data records in the temporary table are read, and the read data records are inserted into the emptied target table. For example, if full data synchronization is performed, the user configuration performs data synchronization loading in ALL form, at this time, the target table is emptied, and then ALL data records in the temporary table are inserted into the target table. The clearing of the target table is to delete the data records in the target table as a whole, delete the ID, and be equivalent to reconstructing the table, and keep the structure of the original data table, but all the states are equivalent to a new table. And the clear list instruction is executed, so that all data in the data list can be deleted quickly, and the data synchronization efficiency is improved.
And step 250, when the data processing requirement is incremental data synchronization, respectively matching the primary keys of the temporary table and the target table, and updating the target table based on each data record according to the matching result.
The primary key generally refers to a primary key, which may be one or more fields in a data table, and the value of the primary key is used to uniquely identify one data record in the data table, and it can be determined whether the data records corresponding to the two primary keys are the same by comparing the values of the primary keys.
If the data processing requirement is incremental data synchronization, then it means that only the changed data needs to be synchronized to the target table. Based on the incremental data synchronization requirement, it may be determined which data records of the temporary table are changed relative to the target table, and then the target table is updated based on the changed data records, so as to achieve the incremental data synchronization requirement.
Illustratively, when the temporary table is not an empty table, one primary key in the temporary table is arbitrarily selected as the current primary key, and the value of the current primary key is compared with the value of each primary key of the target table; when the values are equal, updating the data records corresponding to the corresponding primary key values in the target table based on the data records corresponding to the current primary key, and deleting the data records corresponding to the current primary key from the temporary table; and when the values are not equal, inserting the data record corresponding to the current primary key into the target table.
Specifically, when the data processing requirement is incremental data synchronization, if the user configuration is loaded in the form of MERGE, the primary keys of the data records in the temporary table and the target table are matched. And if the primary key values are equal, the matching is considered to be successful, and if the primary key equal to the value of any primary key in the temporary table is not matched in the target table, the matching is considered to be failed. And when the matching is successful, updating the data record corresponding to the corresponding primary key in the target table based on the data record corresponding to the primary key successfully matched in the temporary table. And when the matching fails, inserting the data record corresponding to the current primary key in the temporary table into the target table.
When the data processing requirement is incremental data synchronization, if the user configuration is loaded in an INSERT form, the data records in the temporary table are inserted into the target table. In this way, the requirement that the system only needs to process the changed data can be met.
Alternatively, if the data file sent by the source system includes both changed data and unchanged data, the changed data needs to be determined by matching the values of the primary keys, and only the changed data records are inserted into the target table.
And step 260, when the data processing requirement is to delete the data synchronization, respectively matching the primary keys of the temporary table and the target table, and deleting the data records corresponding to the successfully matched primary keys in the target table.
Illustratively, if the data processing requirement is to delete data synchronization, the temporary table is matched with the primary key of the target table, and data corresponding to the primary key matched in the target table is deleted synchronously. For example, if the data synchronization is deleted, the user configures the data synchronization loading in the form of DEL, at this time, the primary key values of the temporary table and the target table are matched, the target primary key with the same value as the primary key of the temporary table in the target table is determined, and the data record corresponding to the target primary key in the target table is deleted.
FIG. 2b is a flow chart of an implementation method for synchronizing data between a target table and a temporary table. As shown in fig. 2b, the method specifically includes:
step 201, reading the configuration information, and obtaining the specification about the loading type of the file in the configuration information.
Step 202, determining which of a full data synchronization form, an incremental MERGE form, an incremental INSERT form, and a delete synchronization form the load type is.
And step 203, if the loading type is the full data synchronization form, emptying the target table.
And step 204, inserting the data record of the temporary table into the target table.
Specifically, the data records of the temporary table can be inserted into the target table by adopting a mode of an Insert inter target table from the temporary table.
Step 205, if the load type is in increment MERGE form, MERGE the temporary table into the target table.
Specifically, if the loading type is in the increment MERGE form, the corresponding data record in the target table is updated according to the data record corresponding to the primary key matched in the temporary table, and the data record corresponding to the primary key failed in matching in the temporary table is inserted into the target table.
And step 206, if the loading type is in an increment INSERT form, inserting the data record in the temporary table into the target table.
Specifically, the data records of the temporary table are inserted into the target table by adopting a mode of an Insert inter target table from the temporary table.
And step 207, if the loading type is the synchronous form of the deleted data, deleting the data record in the target table according to the primary key field.
According to the technical scheme, the data records in the temporary table are synchronized to the target table in a mode that a user configures configuration information provisions meeting data processing requirements, a configured data processing method is achieved, individualized data processing requirements are met, independent program development does not need to be conducted according to each data processing requirement, code development complexity is greatly simplified, and code development efficiency is improved.
Fig. 3 is a flowchart of another data processing method according to an embodiment of the present invention, and further increases an implementation manner in a data processing scenario in which a user configures a special processing requirement on the basis of the above technical solution. As shown in fig. 3, the method includes:
step 301, acquiring data field definition and configuration information of a data file to be processed.
And 302, selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, executing the data processing script, and loading the data record in the data file to be processed to a temporary table of the target database.
Step 303, determining whether the user configures the preprocessing operation, if so, executing step 304, otherwise, executing step 305.
Illustratively, the configuration information of the database is read, the value of the attribute representing the preprocessing operation in the configuration information is determined, and whether the preprocessing operation is configured by the user is judged according to the value.
Step 304, acquiring a preprocessing mode corresponding to the preprocessing operation in the configuration information, processing the data record in the temporary table according to the preprocessing mode, and then executing step 305.
The preprocessing mode can be a section of SQL statement, a shell script, a personal java processing class and the like, can be preset by a user according to actual requirements, and is not particularly limited in the implementation form.
And 305, synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information.
Step 306, judging whether the user configures the post-processing operation, if so, executing step 307, otherwise, executing step 308.
Illustratively, the configuration information of the database is read, the value of the attribute representing the post-processing operation in the configuration information is determined, and whether the post-processing operation is configured by the user is judged according to the value.
Step 307, acquiring a post-processing mode corresponding to the post-processing operation in the configuration information, processing the data record in the target table according to the post-processing mode, and then executing step 308.
The post-processing mode can be a section of SQL statement, or a shell script, or a personal java processing type, and the like, and can be preset by a user according to actual requirements.
Step 308, ending the data processing flow.
In a specific embodiment, an overall flow of the data processing method of the present invention is provided. Fig. 4 is an overall flowchart of a data processing method according to an embodiment of the present invention, and as shown in fig. 4, the method includes:
step 401, detect an upstream data file.
The upstream data file is a data file to be processed, which is sent by the source system to the target system.
Illustratively, the target system receives the data file sent by the source system, and the fields in the data file are generally separated or specified by some character, and the types of the fields are well defined. In summary, the technicians of the source and target systems may decide how well the data file should be parsed, i.e., which fields to include, what the data type of each field is. This convention is usually well defined by the source system transmitting a data definition file or offline communication.
Step 402, determining whether a data definition file exists, if so, executing step 403, otherwise, executing step 405.
The data definition file comprises information such as upstream data fields and type definitions, and is a file sent to the target system by the source system.
Step 403, analyzing the data definition file.
Step 404, storing the data definition file and the file identification association of the data file in a database.
Illustratively, whether the data definition file exists or not is judged, if yes, the data definition file processing interface is called to analyze the definition file, and then the analyzed information is saved in a database.
It should be noted that the data definition file processing interface is used for implementing the parsing of different types of data definition files, one type of data definition file has a corresponding parsing implementation method, and the different types of data definition file processing methods can be extended to meet the parsing requirement of a new data definition file.
Step 405, reading the data field definition stored in the database.
If the data definition file does not exist, the appointed data field definition information needs to be configured to the database, and when the data file needs to be synchronized, the configuration information of the database is directly read to obtain the data field definition related to the data file.
Step 406, generating a file loading script according to the data field definition.
Illustratively, according to the acquired data field definition, a data file loading script corresponding to the database is generated. For example, an oracle type database generates an sql drscript, and a mysql type database generates a load data infile script. In addition, the generation logic of the file loading scripts of other databases can be programmed to be used for expanding and supporting the unused databases, so that the problem that the whole file loading mode needs to be rewritten when meeting the processing requirements of the other unused databases is effectively solved, and the code reuse rate is improved.
Step 407, the execution file loading script loads the upstream data file into the temporary table of the database.
Step 408, determining whether a preprocessing operation needs to be performed, if so, performing step 409, otherwise, performing step 410.
And 409, reading the configuration information of the database, and executing a processing mode corresponding to the preprocessing operation in the configuration information.
For some data file synchronization operations, some custom logic processing needs to be performed after the upstream data file is loaded into the temporary table and before the upstream data file is synchronized to the target table to be synchronized from the temporary table. Such as removing some unwanted data, or doing some additional data processing, etc. In the embodiment of the present invention, the customized logic processing may specifically be a segment of SQL statement, a shell script, a JAVA processing class, or the like.
Step 410, reading the configuration information of the database, and obtaining the specification about the loading type of the file in the configuration information.
It should be noted that, for the data file provided by the source system, different operations may be performed according to different processing requirements. For example, some require full data synchronization, some require incremental data synchronization, and some require file data deletion (i.e., deletion of data records in the target table corresponding to data files).
Step 411, determine which of the full data sync form, the incremental MERGE form, the incremental INSERT form, and the delete sync form the load type is.
Step 412, if the loading type is the full data synchronization type, the target table is cleared, and then step 413 is executed.
Step 413, executing the operation of the Insert into target table from the temporary table, and then executing step 417.
In step 414, if the load type is in increment MERGE form, the operation from the temporary table MERGE to the target table is executed, and then step 417 is executed.
Step 415, if the loading type is the increment INSERT form, executing the operation of the INSERT into target table from the temporary table, and then executing step 417.
And step 416, if the loading type is the synchronous form of the deleted data, deleting the data record in the target table according to the primary key field, and then executing step 417.
It should be noted that the processing logic for different synchronization requirements is different. If the data is full data synchronization, the program may configure ALL to perform synchronization loading, at this time, the program will empty the target table, and then insert ALL data records in the temporary table into the target table. If the data is incremental data, synchronous loading is carried out in the form of configurable increment MERGE or increment INSERT. And for the form of increment MERGE, matching the primary keys of the data in the target table and the temporary table, correspondingly updating the data records matched in the temporary table to the target table if the matched primary keys exist in the target table, and inserting the data records of the corresponding primary keys in the temporary table into the target table if a certain primary key in the temporary table does not exist in the target table with the matched primary key. For the incremental INSERT form, mainly aiming at the synchronous requirement that only the data changing every day needs to be processed, the newly added data is inserted into the target table for subsequent logic processing. If the file data is deleted, a form of DEL can be configured for synchronous loading, and the program will synchronously delete the data records corresponding to the primary key on the match in the target table.
Step 417, determine whether the post-processing operation needs to be executed, if yes, execute step 418, otherwise, execute step 419.
For some data file synchronization operations, some custom logic processing is required after the data records are synchronously loaded from the temporary table to the target table. In the embodiment of the present invention, the customized logic processing may specifically be a segment of SQL statement, a shell script, a JAVA processing class, or the like.
And 418, reading the configuration information of the database, and executing a processing mode corresponding to the post-processing operation in the configuration information.
Step 419 ends the data processing flow.
The embodiment of the invention provides a configured data processing method, which can meet most of data file loading requirements only through configuration without additional development for different data file loading requirements. If special requirements are met, the method can be supported in an expansion mode, the work of developers is reduced, the development time is shortened, and the improvement of the propulsion efficiency of the whole project is facilitated.
Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention, where the apparatus may execute a data processing method according to any embodiment of the present invention, and implement data synchronization between subsystems of a distributed system by executing the method. The apparatus may be implemented by software and/or hardware and configured in a server. As shown in fig. 5, the apparatus includes:
an information obtaining module 510, configured to obtain data field definition and configuration information of a data file to be processed;
the script generating module 520 is configured to select a corresponding script template according to the type of the target database, generate a data processing script based on the script template and the data field definition, and execute the data processing script to load the data record in the data file to be processed into the temporary table of the target database;
and a data synchronization module 530, configured to synchronize the data records in the temporary table to the target table according to the data processing requirement in the configuration information.
Embodiments of the present invention provide a data processing apparatus, which implements an extensible and configurable loading manner for data records by configuring a script template and data processing requirements, solves the problems of complex development work and low development efficiency caused by independent development of different processing requirements, and implements an effect of shortening the time of development work.
Optionally, the apparatus further comprises:
the definition file judging module is used for judging whether a data definition file corresponding to the to-be-processed file exists before acquiring the data field definition of the to-be-processed data file;
and the field definition storage module is used for analyzing the data definition file through the data definition file processing interface under the scene that the data definition file corresponding to the file to be processed exists, obtaining the data field definition of the data file to be processed, and storing the data field definition and the file identifier of the data file to be processed in a correlation mode.
Optionally, the information obtaining module 510 is specifically configured to:
and inquiring the target database based on the file identification of the data file to be processed to obtain the data field definition of the data file to be processed.
Optionally, the information obtaining module 510 is further specifically configured to:
and determining a file identifier of the data file to be processed, and inquiring configuration information according to the file identifier to obtain the data field definition of the data file to be processed.
Optionally, the script generating module 520 is specifically configured to:
and filling the target field in the data field definition into the script template to generate the data processing script.
Optionally, the script generating module 520 is further configured to:
determining the type information of a field to be filled in the script template;
and querying the data field definition according to the type information to determine a target field, and filling the target field into the script template to obtain the data processing script.
Optionally, the data synchronization module 530 includes:
the request acquisition submodule is used for acquiring the data processing request in the configuration information;
and the data synchronization submodule is used for synchronizing the data records in the temporary table to a target table in the target database according to a mode specified by the data processing requirement.
Further, the data processing requirements include at least one of: full data synchronization, incremental data synchronization, and delete data synchronization.
Optionally, the data synchronization sub-module is specifically configured to:
when the data processing requirement is full data synchronization, clearing the data records in the target table, and inserting all the data records in the temporary table into the cleared target table;
when the data processing requirement is incremental data synchronization, respectively matching the primary keys of the temporary table and the target table, and updating the target table based on the data records of the temporary table according to the matching result;
and when the data processing requirement is that the data is deleted synchronously, respectively matching the primary keys of the temporary table and the target table, and deleting the data records corresponding to the successfully matched primary keys in the target table.
Optionally, when the data processing requirement is incremental data synchronization, the data synchronization sub-module is further specifically configured to:
when the temporary table is not an empty table, one main key in the temporary table is selected as the current main key at will, and the value of the current main key is compared with the value of each main key in the target table;
when the values are equal, updating the data records corresponding to the corresponding primary key values in the target table based on the data records corresponding to the current primary key, and deleting the data records corresponding to the current primary key from the temporary table;
and when the values are not equal, inserting the data record corresponding to the current primary key into the target table.
Optionally, the apparatus further comprises:
the preprocessing module is used for judging whether a user configures preprocessing operation after the data processing script is executed to load the data record in the data file to be processed into the temporary table of the target database;
if so, acquiring a preprocessing mode corresponding to the preprocessing operation in the configuration information, and processing the data record in the temporary table according to the preprocessing mode;
if not, the data records in the temporary table are synchronized to the target table according to the data processing requirements in the configuration information.
Optionally, the apparatus further comprises:
and the post-processing module is used for acquiring a post-processing mode corresponding to the post-processing operation in the configuration information when detecting that the post-processing operation is configured by the user after the data records in the temporary table are synchronized to the target table, and processing the data records in the target table according to the post-processing mode.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 6 is a schematic structural diagram of a server according to the present invention, and as shown in fig. 6, the server includes a processor 60 and a memory 61; the number of the processors 60 in the server may be one or more, and one processor 60 is taken as an example in fig. 6; the processor 60 and the memory 61 in the server may be connected by a bus or other means, as exemplified by the bus connection in fig. 6.
The memory 61, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules (e.g., the information acquisition module 510, the script generation module 520, and the data synchronization module 530) corresponding to the data processing method in the embodiment of the present invention. The processor 60 executes various functional applications of the server and data processing by executing software programs, instructions, and modules stored in the memory 61, that is, implements the above-described data processing method.
The memory 61 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 61 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 61 may further include memory located remotely from the processor 60, which may be connected to a server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a data processing method, the method comprising:
acquiring data field definition and configuration information of a data file to be processed;
selecting a corresponding script template according to the type of the target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load data records in a data file to be processed into a temporary table of the target database;
and synchronizing the data records in the temporary table to the target table according to the data processing requirements in the configuration information.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the method described above, and may also perform related operations in the data processing method provided by any embodiment of the present invention.
Embodiments of the present invention further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the data processing method according to any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the data processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (16)

1. A data processing method, comprising:
acquiring data field definition and configuration information of a data file to be processed;
selecting a corresponding script template according to the type of a target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load data records in the data file to be processed into a temporary table of the target database;
and synchronizing the data records in the temporary table to a target table according to the data processing requirements in the configuration information.
2. The method of claim 1, prior to obtaining the data field definitions of the data file to be processed, further comprising:
judging whether a data definition file corresponding to the file to be processed exists or not;
if so, analyzing the data definition file through a data definition file processing interface to obtain a data field definition of the data file to be processed, and storing the data field definition and a file identifier of the data file to be processed in an associated manner.
3. The method of claim 2, wherein obtaining the data field definition of the data file to be processed comprises:
and querying the target database based on the file identification of the data file to be processed to obtain the data field definition of the data file to be processed.
4. The method of claim 1, wherein obtaining the data field definition of the data file to be processed comprises:
and determining the file identification of the data file to be processed, and inquiring the configuration information according to the file identification to obtain the data field definition of the data file to be processed.
5. The method of claim 1, wherein generating a data processing script based on the script template and the data field definition comprises:
and filling the target field in the data field definition into the script template to generate a data processing script.
6. The method of claim 5, wherein populating the target field in the data field definition to the script template generates a data processing script, comprising:
determining the type information of the field to be filled in the script template;
and querying the data field definition according to the type information to determine a target field, and filling the target field into the script template to obtain a data processing script.
7. The method of claim 1, wherein synchronizing the data records in the temporary table to a target table according to the data processing requirements in the configuration information comprises:
acquiring a data processing requirement in the configuration information;
and synchronizing the data records in the temporary table to a target table in the target database according to the data processing requirement.
8. The method of claim 7, wherein the data processing requirements include at least one of: full data synchronization, incremental data synchronization, and delete data synchronization.
9. The method of claim 7, wherein synchronizing the data records in the temporary table to a target table in the target database according to the data processing requirement comprises:
when the data processing requirement is that the full data is synchronous, clearing the data records in the target table, and inserting all the data records in the temporary table into the cleared target table;
when the data processing requirement is that the incremental data are synchronous, respectively matching the primary keys of the temporary table and the target table, and updating the target table based on the data records of the temporary table according to the matching result;
and when the data processing requirement is that the deleted data are synchronous, respectively matching the primary keys of the temporary table and the target table, and deleting the data records corresponding to the successfully matched primary keys in the target table.
10. The method of claim 9, wherein matching the primary keys of the temporary table and the target table, respectively, and updating the target table based on the data records of the temporary table according to the matching result comprises:
when the temporary table is not an empty table, one main key in the temporary table is selected as a current main key at will, and the value of the current main key is compared with the value of each main key of the target table;
when the values are equal, updating the data records corresponding to the corresponding primary key values in the target table based on the data records corresponding to the current primary key, and deleting the data records corresponding to the current primary key from the temporary table;
and when the values are not equal, inserting the data record corresponding to the current primary key into the target table.
11. The method of claim 1, wherein after executing the data processing script to load the data records in the data file to be processed into the temporary table of the target database, further comprising:
judging whether a user configures preprocessing operation;
if so, acquiring a preprocessing mode corresponding to the preprocessing operation in the configuration information, and processing the data record in the temporary table according to the preprocessing mode;
if not, the data records in the temporary table are synchronized to the target table according to the data processing requirement in the configuration information.
12. The method of claim 1, further comprising, after synchronizing the data records in the temporary table to the target table:
and when detecting that a user configures post-processing operation, acquiring a post-processing mode corresponding to the post-processing operation in the configuration information, and processing the data record in the target table according to the post-processing mode.
13. A data processing apparatus, comprising:
the information acquisition module is used for acquiring data field definition and configuration information of the data file to be processed;
the script generation module is used for selecting a corresponding script template according to the type of a target database, generating a data processing script based on the script template and the data field definition, and executing the data processing script to load the data records in the data file to be processed into a temporary table of the target database;
and the data synchronization module is used for synchronizing the data records in the temporary table to a target table according to the data processing requirements in the configuration information.
14. A server, characterized in that the server comprises:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-12.
15. A storage medium containing computer-executable instructions for performing the data processing method of any one of claims 1-12 when executed by a computer processor.
16. A computer program product comprising a computer program, characterized in that the computer program realizes the data processing method according to any one of claims 1-12 when executed by a processor.
CN202111275986.XA 2021-10-29 2021-10-29 Data processing method, device, server, storage medium and product Pending CN114020840A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111275986.XA CN114020840A (en) 2021-10-29 2021-10-29 Data processing method, device, server, storage medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111275986.XA CN114020840A (en) 2021-10-29 2021-10-29 Data processing method, device, server, storage medium and product

Publications (1)

Publication Number Publication Date
CN114020840A true CN114020840A (en) 2022-02-08

Family

ID=80058911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111275986.XA Pending CN114020840A (en) 2021-10-29 2021-10-29 Data processing method, device, server, storage medium and product

Country Status (1)

Country Link
CN (1) CN114020840A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114911854A (en) * 2022-05-09 2022-08-16 建信金融科技有限责任公司 Data processing method and device
CN116756246A (en) * 2023-08-17 2023-09-15 太平金融科技服务(上海)有限公司深圳分公司 Data synchronization method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114911854A (en) * 2022-05-09 2022-08-16 建信金融科技有限责任公司 Data processing method and device
CN116756246A (en) * 2023-08-17 2023-09-15 太平金融科技服务(上海)有限公司深圳分公司 Data synchronization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20220179642A1 (en) Software code change method and apparatus
US11210181B2 (en) System and method for implementing data manipulation language (DML) on Hadoop
CN111427561A (en) Service code generation method and device, computer equipment and storage medium
CN114020840A (en) Data processing method, device, server, storage medium and product
CN107992537B (en) Service attribute transmission method, device, computer equipment and storage medium
CN109361628B (en) Message assembling method and device, computer equipment and storage medium
CN111818175B (en) Enterprise service bus configuration file generation method, device, equipment and storage medium
CN112363845A (en) Data synchronization method of system integration middling station and integration middling station system
CN111737227A (en) Data modification method and system
CN110597518A (en) Project construction method and device, computer equipment and storage medium
CN104423982A (en) Request processing method and device
CN112685091A (en) Service request processing method, device, equipment and medium based on big data
CN111124872A (en) Branch detection method and device based on difference code analysis and storage medium
CN112395307A (en) Statement execution method, statement execution device, server and storage medium
JP2023553220A (en) Process mining for multi-instance processes
CN115525534A (en) Test case generation method and platform based on swagger interface test
CN113377789A (en) Processing method and device for database change data, computer equipment and medium
CN115794202A (en) Data configuration method and engine, file system and computer storage medium
CN116048609A (en) Configuration file updating method, device, computer equipment and storage medium
CN113934792B (en) Processing method and device of distributed database, network equipment and storage medium
CN111080250B (en) Flow backspacing compensation method and device, storage medium and electronic equipment
CN113806327A (en) Database design method and device and related equipment
CN113448980A (en) Method and device for generating SQL (structured query language) statement and electronic equipment
CN106681914B (en) Television picture quality debugging method and device
CN111611447B (en) Computer and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination