CN111563090B - Method and device for loading homologous data by multi-batch system - Google Patents

Method and device for loading homologous data by multi-batch system Download PDF

Info

Publication number
CN111563090B
CN111563090B CN202010385299.2A CN202010385299A CN111563090B CN 111563090 B CN111563090 B CN 111563090B CN 202010385299 A CN202010385299 A CN 202010385299A CN 111563090 B CN111563090 B CN 111563090B
Authority
CN
China
Prior art keywords
data
batch
ods
file
source system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010385299.2A
Other languages
Chinese (zh)
Other versions
CN111563090A (en
Inventor
丁丽娜
郑土清
聂芳
杨晓旺
温灏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010385299.2A priority Critical patent/CN111563090B/en
Publication of CN111563090A publication Critical patent/CN111563090A/en
Application granted granted Critical
Publication of CN111563090B publication Critical patent/CN111563090B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for loading homologous data by a multi-batch system, wherein the method comprises the following steps: reading a first configuration file; according to the first configuration file, generating an ODS table used when each batch system loads the source system data table file, so that the time range of the historical data in the ODS table meets the batch time range of each batch system; loading data contained in a source system data table file into an ODS table; updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a second configuration file containing the table structure change information of the ODS; according to the second configuration file, each batch system is controlled to read the ODS table data in each batch time range from the updated ODS table. The application can realize the purpose of once loading and repeatedly using the data source text.

Description

Method and device for loading homologous data by multi-batch system
Technical Field
The application relates to the field of big data, in particular to a method and a device for loading homologous data by a multi-batch system.
Background
This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In the big data age, enterprise data is often handled by a number of different batch systems due to differences in timeliness and demand. Under the trend of big data, each batch system of enterprises processes several T or even tens of T of data every day. For example, each batch system of a bank receives a large amount of data text from an upstream data source system each day and performs the following data cleansing load operations on the received data text: checking configuration information; detecting whether a data text arrives; when the data text arrives, judging whether the received data text needs to be subjected to data cleaning according to a data cleaning rule; after the data is washed out successfully, the data is loaded to the ODS table of the batch system.
The method for loading homologous data by the existing multi-batch system comprises the following steps: the data cleansing loading process described above is repeated one time per day by each batch system. In this way, not only a lot of resources and time are consumed, but also maintenance of each batch system becomes complicated, and due to the different batch time ranges of each batch system, once the data structure of the data source system changes, a situation that maintenance of a certain batch system is missed easily occurs.
Therefore, a data loading method is provided for a plurality of batch systems for loading the data text of the same data source, and the data text of the source system is supported to be loaded once and used for a plurality of times, so that the problem to be solved is urgent at present.
Disclosure of Invention
The embodiment of the application provides a method for loading homologous data by a multi-batch system, which is used for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, the data loading process is repeatedly executed for one time, resources and time are consumed, and the maintenance of individual batch systems is easy to occur, and the method comprises the following steps: reading a first configuration file, wherein the first configuration file comprises: loading configuration information of the data table file of the same source system and batch time ranges of all batch systems by a plurality of batch systems; generating an ODS table used when each batch system loads a source system data table file according to the first configuration file, wherein the time range of the historical data in the ODS table meets the batch time range of each batch system; loading data to be loaded contained in a source system data table file into an ODS table; updating the table structure of the ODS table according to the table structure change information of the source system data table file to generate a corresponding second configuration file, wherein the second configuration file comprises: table structure change information of the ODS table; controlling each batch system to read ODS table data in each batch time range from the updated ODS table according to the second configuration file; updating the table structure of the ODS table according to the table structure change information of the source system data table file to generate a corresponding second configuration file, wherein the method comprises the following steps: when the table structure change information of the source system data table file is a new data field, adding the new data field in the ODS table, and configuring the position information of the new data field corresponding to other data fields in the source system data table file in the second configuration file; when the table structure change information of the source system data table file is a deleted data field, filling data corresponding to the deleted data field in the ODS table into a default value NULL, and configuring the validity of the deleted data field to be zero in the second configuration file; and when the table structure change information of the source system data table file is the position order of the changed data fields, keeping the table structure of the ODS table unchanged, and configuring the position information of each data field after the position order is changed in the second configuration file.
The embodiment of the application also provides a device for loading homologous data by a multi-batch system, which is used for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, the data loading process is repeatedly executed, resources and time are consumed, and the maintenance of individual batch systems is easy to occur, and the device comprises: the data loading configuration module is used for reading a first configuration file, wherein the first configuration file comprises: loading configuration information of the data table file of the same source system and batch time ranges of all batch systems by a plurality of batch systems; the ODS table generation module is used for generating an ODS table used when each batch system loads the source system data table file according to the first configuration file, wherein the time range of the historical data in the ODS table meets the batch time range of each batch system; the source system data loading module is used for loading the data to be loaded contained in the source system data table file into the ODS table; the ODS table updating module is configured for updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a corresponding second configuration file, where the second configuration file includes: table structure change information of the ODS table; the batch system data reading module is used for controlling each batch system to read the ODS table data in each batch time range from the updated ODS table according to the second configuration file; the ODS table updating module includes: a new field updating module, configured to add a new data field in the ODS table and configure location information of the new data field corresponding to other data fields in the source system data table file in the second configuration file when the table structure change information of the source system data table file is the new data field; a deleted field updating module, configured to, when the table structure change information of the source system data table file is a deleted data field, fill data corresponding to the deleted data field in the ODS table to a default value NULL, and configure the validity of the deleted data field to be zero in the second configuration file; and the change field position sequence updating module is used for keeping the table structure of the ODS table unchanged and configuring the position information of each data field after the position sequence is changed in the second configuration file when the table structure change information of the source system data table file is the position sequence of the change data field.
The embodiment of the application also provides a computer device which is used for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, the data loading process needs to be repeatedly executed for one time, resources and time are consumed, and maintenance omission of individual batch systems easily occurs.
The embodiment of the application also provides a computer readable storage medium for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, the data loading process needs to be repeatedly executed for one time, resources and time are consumed, and maintenance omission of individual batch systems is easy to occur.
In the embodiment of the application, the configuration information of the data table file of the same source system and the batch time range of each batch system are loaded by configuring a plurality of batch systems in the first configuration file in advance, so that after the first configuration file is read, ODS tables used when each batch system loads the data table file of the source system are generated according to the first configuration file, and the time range of the historical data in the ODS tables meets the batch time range of each batch system; after receiving a source system data table file from a data source system, loading data to be loaded contained in the source system data table file into a generated ODS table; and further monitoring the table structure change information of the source system data table file, updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a second configuration file containing the ODS table structure change information so as to facilitate the second configuration file to control each batch system to read the ODS table data in each batch time range from the updated ODS table.
According to the embodiment of the application, the purposes of once loading and repeatedly using the data source text can be realized based on the parameterized configuration file and the mode of reserving a section of loaded ODS table, the whole batch time of a plurality of batch systems is greatly shortened, and the risk of maintenance omission caused by different batch dates of each batch system can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flow chart of a method for loading homologous data in a multi-batch system according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for loading homologous data in an alternative multi-batch system provided in an embodiment of the present application;
fig. 3 is a schematic diagram of an apparatus for loading homologous data in a multi-batch system according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present application and their descriptions herein are for the purpose of explaining the present application, but are not to be construed as limiting the application.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are open-ended terms, meaning including, but not limited to. The description of the reference terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The order of steps involved in the embodiments is illustrative of the practice of the application, and is not limited and may be suitably modified as desired.
In the embodiment of the application, a method for loading homologous data by a multi-batch system is provided, and fig. 1 is a flowchart of a method for loading homologous data by a multi-batch system, as shown in fig. 1, the method includes the following steps:
s101, reading a first configuration file, wherein the first configuration file comprises: the plurality of batch systems load configuration information of the same source system data table file and batch time ranges of the batch systems.
It should be noted that the plurality of batch systems involved in the embodiments of the present application may be, but are not limited to, a plurality of batch systems or sub-systems in a large data platform that read data from the same data source system. Because of the different data source systems, the data to be loaded may be stored in data texts in different formats, and in the embodiment of the present application, the data texts containing the data to be loaded in the data source systems (typically, the upstream source systems of the batch systems) are uniformly represented by the source system data table file.
For a plurality of batch systems loading data from the same data source system, in order to achieve the purposes of one-time loading and multiple-time use of data text, in the embodiment of the application, the configuration information of each batch system loading the same source system data table file and the batch time range of each batch system are configured in a first configuration file in advance, so that the data used by each batch system can be loaded into the ODS table at one time based on the first configuration file, and the data of each batch time range can be read from the ODS table when each batch system executes batch processing.
For example, for both a and b batch systems, the loading information of both load source system data table files may be configured in configuration file p (i.e., the first configuration file).
S102, generating an ODS table used when each batch system loads a source system data table file according to the first configuration file, wherein the time range of the historical data in the ODS table meets the batch time range of each batch system.
It should be noted that, in order to enable the data loaded into the ODS table at a time to satisfy the batch use of a plurality of batch systems, the data in the ODS table that remains in the earliest batch time range is required, that is, the batch system in which the data in the ODS table remains in a time range at least greater than the maximum batch time range can satisfy the batch use of all batch systems. For example, for a a, b, c, D, e five-lot system loaded with homologous data, the lot dates are d+1, d+2, d+3, d+4, d+5 days, respectively; the ODS table generated by S102 described above is reserved for at least 5 days to satisfy the use of the five lot system.
S103, loading the data to be loaded contained in the source system data table file into the ODS table.
Specifically, after the ODS tables used by the source system data table file for each batch system load are generated through S102 described above, the data to be loaded included in the source system data table file may be loaded into the generated ODS tables through S103 described above. Alternatively, a common clean loader may be used to load the data to be loaded contained in the source system data table file into the ODS table at the time of loading the source system data table file.
S104, updating the table structure of the ODS table according to the table structure change information of the source system data table file to generate a corresponding second configuration file, wherein the second configuration file comprises: table structure change information of the ODS table.
It should be noted that, when the table structure of the source system data table file changes, the ODS table used by each batch system also needs to update the table structure; however, since the ODS table is used by a plurality of batch systems with different batch time ranges, in order to avoid the problem that the ODS table structure is inconsistent due to the different batch times of the batch systems, in the embodiment of the present application, by monitoring whether the table structure of the source system data table file changes, updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a second configuration file containing the ODS table structure change information, each subsequent batch system reads the ODS table data in the respective batch range from the ODS table according to the second configuration file.
Optionally, in an embodiment of the present application, the table structure change information of the source system data table file includes, but is not limited to: new data fields, delete data fields, and change the data field position order. For these three cases, the above S104 may be specifically implemented by the following steps: (1) when the table structure change information of the source system data table file is the newly added data field, adding the newly added data field in the ODS table, and configuring the position information of the newly added data field corresponding to other data fields in the source system data table file in the second configuration file; (2) when the table structure change information of the source system data table file is the deleted data field, filling the data corresponding to the deleted data field in the ODS table into a default value NULL, and configuring the validity of the deleted data field in the second configuration file to be zero; (3) when the table structure change information of the source system data table file is the position order of the changed data fields, the table structure of the ODS table is kept unchanged, and the position information of each data field after the position order is changed is configured in the second configuration file.
Optionally, in the case of a newly added data field, in the embodiment of the present application, a column of the newly added data field is added after a column corresponding to the last data field in the ODS table.
S105, controlling each batch system to read the ODS table data in each batch time range from the updated ODS table according to the second configuration file.
It should be noted that, in the second configuration file, the relative field location information of each field in the ODS table in the source system data table file and the attribute information about whether each field is valid or not may be recorded. Optionally, in an embodiment of the present application, the validity of the virtual column field corresponding to the invalid data is configured to be zero.
In the case where the table structure change of the source system data table file is to change the order of the positions of the data fields, it is only necessary to change the order of the positions of the respective ODS table fields in the second configuration file.
For the case that the table structure change of the source system data table file is a new data field, a column representing the new data field may be added to the ODS table, and the location information of the new data field in the source system data table file relative to other data fields may be recorded in the second configuration file.
For the case that the table structure change of the source system data table file is to delete a certain data field, in order to ensure that each batch system in the first configuration file can normally execute batch, it is required to ensure that each batch system in the first configuration file executes batch, and then delete the corresponding data field in the ODS table.
In one embodiment, for the case that deleting the data field occurs in the source system data table file, the ODS table updating method adopted in the embodiment of the present application is: when a batch system for executing batches at the latest in the plurality of batch systems loads a source system data table file, filling data corresponding to the deleted data field in the ODS table into a default value NULL, and configuring the validity of the deleted data field in a second configuration file to be zero; and deleting the data field with zero validity and the corresponding data in the ODS table when the batch system of the latest execution batch in the batch systems loads the source system data table file.
Because the first configuration file contains the batch time range of each batch system, the time sequence of each batch system executing the batch can be determined according to the batch time range of each batch system, and then the batch system executing the batch earliest and the batch system executing the batch latest can be determined.
For example, suppose that the source system data table file contains: t1, t2, t3, t4, in profile s (i.e. the second profile) the order of the positions of the four fields is indicated with the sequence no=1, 2,3, 4.
If the source system data table file is incremented by one field t5 between t2 and t3, then a column t5 is incremented at the end of the ODS table and n0= 1,2,5,3,4 is recorded in the second configuration file. When the batch system executing the batch earliest is actually loaded, after the configuration file s is modified, other batch systems do not need to be modified again when executing the batch. The newly added fields may be processed in the same manner at the beginning and end.
If the field t3 is deleted from the source system data table file, when the batch system executing the batch earliest, the ODS table structure remains unchanged, the validity flag of the configuration field t3 in the configuration file s is zero, and when the data is actually loaded, a default NULL value can be filled into the virtual column corresponding to the field t 3. When the batch system executing the batch at the latest executes the batch, deleting the field with the validity mark of zero and the corresponding data in the ODS table according to the configuration file s. In the embodiment of the application, aiming at the condition of deleting the field by the data source, a virtual column mode is adopted to avoid the problem of inconsistent ODS table structures caused by asynchronous batch time of a batch subsystem.
As can be seen from the above, in the method for loading homologous data by multiple batch systems provided in the embodiment of the present application, the configuration information of the same source system data table file and the batch time ranges of each batch system are loaded by configuring multiple batch systems in the first configuration file in advance, so that after the first configuration file is read, an ODS table used when each batch system loads the source system data table file is generated according to the first configuration file, so that the time range of the historical data in the ODS table satisfies the batch time range of each batch system; after receiving a source system data table file from a data source system, loading data to be loaded contained in the source system data table file into a generated ODS table; and further monitoring the table structure change information of the source system data table file, updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a second configuration file containing the ODS table structure change information so as to facilitate the second configuration file to control each batch system to read the ODS table data in each batch time range from the updated ODS table.
According to the method for loading the homologous data by the multi-batch system, provided by the embodiment of the application, based on the parameterized configuration file and the mode of reserving a section of loaded ODS table, the purposes of once loading and repeatedly using the data source text can be realized, the whole batch time of the multi-batch system is greatly shortened, and the risk of maintenance omission caused by different batch dates of each batch system can be avoided.
Embodiments of the present application are described below in terms of a A, B two-batch system. Suppose the batch date for the batch A system is D+1; the batch date of the batch system B is D+5 days; A. and B, simultaneously receiving the table T file provided by the upstream original system by the two batch systems. In order to achieve the purpose of loading files once and using files for multiple times, as shown in fig. 2, the method for loading homologous data by the multi-batch system provided by the embodiment of the application can comprise the following steps:
s201, configuring batch use information of the ODS table: configuring loading information of a loading source system data table file of each batch system and a batch time range of each batch system in a first configuration file;
s202, generating an ODS table according to the batch use information of the ODS table, and configuring a data cleaning rule of the ODS table: generating an ODS table used when each batch system loads a source system data table file according to batch use information of the ODS table, and configuring a cleaning rule of the ODS table based on a batch time range of each batch system so as to load data contained in the source system data table file into the ODS table based on a common cleaning loading program when the source system data table file arrives, and enabling the ODS table to reserve historical data of use of each batch system;
s203, each batch subsystem reads ODS table data from the ODS table according to each batch time range;
s204, monitoring whether the table structure of the source system data table file is changed or not;
s205, judging whether the change of the table structure is a newly added field or a deleted field under the condition that the table structure of the source system data table file is changed; if the field is newly added, executing S206; if the field is deleted, then S207 is performed;
s206, when the table structure is changed into a new field, placing the new field into the last column of the ODS table, and configuring the relative position information of the new field corresponding to the source system data in the second configuration file; and when the batch systems actually load the files, loading the source system files according to the second configuration files.
S207, when the table structure is changed into a deleted field, keeping the ODS table structure unchanged, and configuring the validity of the deleted field to be zero in the second configuration file;
s208, according to the second configuration file, the fields with zero validity are filled with NULL values when the source system files are actually loaded by the batch systems, or the fields with zero validity in the ODS table and corresponding data are deleted.
Based on the same inventive concept, the embodiment of the application also provides a device for loading homologous data by a multi-batch system, as described in the following embodiment. Because the principle of the solution of the embodiment of the device is similar to that of the method for loading the homologous data by the multi-batch system, the implementation of the embodiment of the device can refer to the implementation of the method, and the repetition is omitted.
Fig. 3 is a schematic diagram of an apparatus for loading homologous data in a multi-batch system according to an embodiment of the present application, where, as shown in fig. 3, the apparatus includes: a data loading configuration module 31, an ODS table generation module 32, a source system data loading module 33, an ODS table update module 34, and a batch system data reading module 35.
The data loading configuration module 31 is configured to read a first configuration file, where the first configuration file includes: loading configuration information of the data table file of the same source system and batch time ranges of all batch systems by a plurality of batch systems; an ODS table generating module 32, configured for generating an ODS table for use when each batch system loads a source system data table file according to a first configuration file, where a time range in which history data is retained in the ODS table satisfies a batch time range of each batch system; the source system data loading module 33 is configured to load data to be loaded, which is included in the source system data table file, into the ODS table; the ODS table updating module 34 is configured for updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a corresponding second configuration file, where the second configuration file includes: table structure change information of the ODS table; and a batch system data reading module 35, configured to control each batch system to read the ODS table data in each batch time range from the updated ODS table according to the second configuration file.
In one embodiment, the ODS table updating module may specifically include: an add field update module 341, a delete field update module 342, and a change field location order update module 343;
the newly added field updating module 341 is configured to add a newly added data field in the ODS table and configure position information of the newly added data field corresponding to other data fields in the source system data table file in the second configuration file when the table structure change information of the source system data table file is the newly added data field; a deleted field updating module 342, configured to, when the table structure change information of the source system data table file is a deleted data field, fill data corresponding to the deleted data field in the ODS table to a default value NULL, and configure the validity of the deleted data field to be zero in the second configuration file; the change field position order update module 343 is configured to, when the table structure change information of the source system data table file is the change data field position order, keep the table structure of the ODS table unchanged, and configure the position information of each data field after the change of the position order in the second configuration file.
Optionally, the deletion field updating module 342 may be further configured to fill data corresponding to a deletion data field in the ODS table to a default value NULL when a batch system of the plurality of batch systems that executes the batch at the latest loads a source system data table file, and configure the validity of the deletion data field to be zero in the second configuration file; and deleting the data field with zero validity and the corresponding data in the ODS table when the batch system of the latest execution batch in the batch systems loads the source system data table file.
Optionally, the above-mentioned newly added field updating module 341 may be further configured to add a column of the newly added data field after a column corresponding to the last data field in the ODS table.
Based on the same inventive concept, the embodiment of the application also provides a computer device, which is used for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, the data loading process needs to be repeatedly executed for one time, resources and time are consumed, and maintenance omission of individual batch systems is easy to occur.
Based on the same inventive concept, the embodiment of the application also provides a computer readable storage medium for solving the technical problems that in the existing big data platform, for each batch system needing to load data from the same data source system, a data loading process needs to be repeatedly executed, resources and time are consumed, and maintenance omission of individual batch systems is easy to occur, and the computer readable storage medium stores a computer program for executing the method for loading homologous data by the multi-batch system.
In summary, the embodiments of the present application provide a method, an apparatus, a computer device, and a computer readable storage medium for loading homologous data in a multi-batch system, which can achieve the purposes of once loading and multiple times of usage of a data source text by adopting a parameterized configuration file and a manner of reserving an ODS table after loading for a period of time. Aiming at the condition that a data source deletes a field, adds a field and changes the sequence of the field, a virtual column mode is adopted to avoid the problem that the individual batch systems are not maintained due to inconsistent batch time of each batch system, so that the ODS table structures are inconsistent. The data loading scheme provided by the embodiment of the application is applied to loading the data of the same upstream source system by a plurality of batch subsystems of the big data platform, so that the whole batch time of the big data platform can be saved.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (8)

1. A method for loading homologous data in a multi-batch system, comprising:
reading a first configuration file, wherein the first configuration file comprises: loading configuration information of the data table file of the same source system and batch time ranges of all batch systems by a plurality of batch systems;
generating an ODS table used when each batch system loads the source system data table file according to the first configuration file, wherein the time range of the historical data in the ODS table meets the batch time range of each batch system;
loading data to be loaded contained in the source system data table file into the ODS table;
updating the table structure of the ODS table according to the table structure change information of the source system data table file to generate a corresponding second configuration file, wherein the second configuration file comprises: table structure change information of the ODS table;
controlling each batch system to read ODS table data in each batch time range from the updated ODS table according to the second configuration file;
updating the table structure of the ODS table according to the table structure change information of the source system data table file to generate a corresponding second configuration file, wherein the method comprises the following steps:
when the table structure change information of the source system data table file is a new data field, adding the new data field in the ODS table, and configuring the position information of the new data field corresponding to other data fields in the source system data table file in the second configuration file;
when the table structure change information of the source system data table file is a deleted data field, filling data corresponding to the deleted data field in the ODS table into a default value NULL, and configuring the validity of the deleted data field to be zero in the second configuration file;
and when the table structure change information of the source system data table file is the position order of the changed data fields, keeping the table structure of the ODS table unchanged, and configuring the position information of each data field after the position order is changed in the second configuration file.
2. The method of claim 1, wherein in the case where the table structure change information of the source system data table file is a delete data field, the method further comprises:
when a batch system for executing batches earliest in the plurality of batch systems loads the source system data table file, filling data corresponding to the deleted data field in the ODS table as a default value NULL, and configuring the validity of the deleted data field as zero in the second configuration file;
and deleting the data field with zero validity and corresponding data in the ODS table when the batch system of the latest execution batch in the batch systems loads the source system data table file.
3. The method of claim 1, wherein in the case where the table structure change information of the source system data table file is a newly added data field, a column of the newly added data field is added after a column corresponding to an end data field in the ODS table.
4. An apparatus for loading homologous data in a multi-batch system, comprising:
the data loading configuration module is used for reading a first configuration file, wherein the first configuration file comprises: loading configuration information of the data table file of the same source system and batch time ranges of all batch systems by a plurality of batch systems;
the ODS table generating module is used for generating an ODS table used when each batch system loads the source system data table file according to the first configuration file, wherein the time range of the historical data in the ODS table meets the batch time range of each batch system;
the source system data loading module is used for loading the data to be loaded contained in the source system data table file into the ODS table;
the ODS table updating module is configured for updating the table structure of the ODS table according to the table structure change information of the source system data table file, and generating a corresponding second configuration file, where the second configuration file includes: table structure change information of the ODS table;
the batch system data reading module is used for controlling each batch system to read the ODS table data in each batch time range from the updated ODS table according to the second configuration file;
the ODS table updating module includes:
a new field updating module, configured to add a new data field in the ODS table and configure location information of the new data field corresponding to other data fields in the source system data table file in the second configuration file when the table structure change information of the source system data table file is the new data field;
a deleted field updating module, configured to, when the table structure change information of the source system data table file is a deleted data field, fill data corresponding to the deleted data field in the ODS table to a default value NULL, and configure the validity of the deleted data field to be zero in the second configuration file;
and the change field position sequence updating module is used for keeping the table structure of the ODS table unchanged and configuring the position information of each data field after the position sequence is changed in the second configuration file when the table structure change information of the source system data table file is the position sequence of the change data field.
5. The apparatus of claim 4, wherein the delete field update module is further configured to, when a batch system that performs a batch earliest among the plurality of batch systems loads the source system data table file, populate data corresponding to the delete data field in the ODS table with a default value NULL, and configure the validity of the delete data field in the second configuration file to be zero; and deleting the data field with zero validity and corresponding data in the ODS table when the batch system of the latest execution batch in the batch systems loads the source system data table file.
6. The apparatus of claim 4, wherein the newly added field updating module is further configured to add a column of the newly added data field after a column corresponding to an end data field in the ODS table.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of loading a multi-batch system with homologous data as claimed in any one of claims 1 to 3 when the computer program is executed by the processor.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for executing the method of loading homologous data for a multi-batch system according to any of claims 1 to 3.
CN202010385299.2A 2020-05-09 2020-05-09 Method and device for loading homologous data by multi-batch system Active CN111563090B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010385299.2A CN111563090B (en) 2020-05-09 2020-05-09 Method and device for loading homologous data by multi-batch system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010385299.2A CN111563090B (en) 2020-05-09 2020-05-09 Method and device for loading homologous data by multi-batch system

Publications (2)

Publication Number Publication Date
CN111563090A CN111563090A (en) 2020-08-21
CN111563090B true CN111563090B (en) 2023-11-21

Family

ID=72068027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010385299.2A Active CN111563090B (en) 2020-05-09 2020-05-09 Method and device for loading homologous data by multi-batch system

Country Status (1)

Country Link
CN (1) CN111563090B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821145B (en) * 2023-08-29 2023-11-14 江南大学附属医院 Self-adaptive table structure adjusting method and system for identifying data change

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2144175A1 (en) * 2008-07-11 2010-01-13 Software AG Method for performing a bulk load into a database
CN106528070A (en) * 2015-09-15 2017-03-22 阿里巴巴集团控股有限公司 Data table generation method and equipment
CN107885761A (en) * 2017-02-20 2018-04-06 平安科技(深圳)有限公司 Batch data loading method and device
CN109542875A (en) * 2018-11-20 2019-03-29 中国银行股份有限公司 A kind of generation method and device of configuration file
CN111078777A (en) * 2019-12-13 2020-04-28 紫光云(南京)数字技术有限公司 Method for loading data based on dynamic increment of relational database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2144175A1 (en) * 2008-07-11 2010-01-13 Software AG Method for performing a bulk load into a database
CN106528070A (en) * 2015-09-15 2017-03-22 阿里巴巴集团控股有限公司 Data table generation method and equipment
CN107885761A (en) * 2017-02-20 2018-04-06 平安科技(深圳)有限公司 Batch data loading method and device
CN109542875A (en) * 2018-11-20 2019-03-29 中国银行股份有限公司 A kind of generation method and device of configuration file
CN111078777A (en) * 2019-12-13 2020-04-28 紫光云(南京)数字技术有限公司 Method for loading data based on dynamic increment of relational database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁礼方."信息系统数据库的分类".《金融科技时代》.2013,第50-55页. *

Also Published As

Publication number Publication date
CN111563090A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
US10346381B2 (en) Atomic update operations in a data storage system
US9384202B1 (en) Gateway module to access different types of databases
CN110471754B (en) Data display method, device, equipment and storage medium in job scheduling
CN107665219B (en) Log management method and device
JP2014149564A (en) Information processing apparatus, information processing method and program
CN111563090B (en) Method and device for loading homologous data by multi-batch system
CN111966760B (en) Test data generation method and device based on Hive data warehouse
CN110597821B (en) Method and device for changing database table structure
CN102096676B (en) Data updating and query control method and system
CN106528876B (en) The information processing method and distributed information processing system of distributed system
CN112860412A (en) Service data processing method and device, electronic equipment and storage medium
CN112419018B (en) General data reconciliation method, server and storage medium in distributed environment
CN108021448B (en) Kernel space optimization method and device
CN109684051A (en) A kind of method and system of the hybrid asynchronous submission of big data task
CN116303348A (en) Method and system for online migration of DRDS (data distribution system) database and table division based on cap
CN113326401B (en) Method and system for generating field blood relationship
CN112860779B (en) Batch data importing method and device
CN114896215A (en) Metadata storage method and device
CN112000414B (en) Configurable display method and device for parameter information
CN112131051A (en) Data backup method and device based on parametric configuration
CN110659042A (en) Server firmware management method and device
CN112579605B (en) Data storage method, device, storage medium and server
CN110825759B (en) Data updating method based on key tool
CN114896259A (en) Method for importing data and files together
CN115543289A (en) Date and time processing method and device of system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant