CN115640349A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115640349A
CN115640349A CN202211372228.4A CN202211372228A CN115640349A CN 115640349 A CN115640349 A CN 115640349A CN 202211372228 A CN202211372228 A CN 202211372228A CN 115640349 A CN115640349 A CN 115640349A
Authority
CN
China
Prior art keywords
data
information
configuration information
target
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211372228.4A
Other languages
Chinese (zh)
Inventor
赖祥顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211372228.4A priority Critical patent/CN115640349A/en
Publication of CN115640349A publication Critical patent/CN115640349A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data processing, and provides a data processing method, a device, equipment and a storage medium, wherein the method acquires rule configuration information and determines an access instruction according to the rule configuration information, wherein the access instruction comprises access execution information, data source configuration information and transmission information configuration information; performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file; and sending the target data file to a target system according to the transmission information configuration information.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
With the social development and the widespread application of informatization, various data applications such as statistical analysis and data mining based on data come along. A single type of data cannot meet analysis scenes, various types of information data need to be mixed and used in different scenes to mine data values, and cross-system transmission of the data becomes a necessary requirement.
At present, a general data synchronization scheme in the market belongs to an open source program, the maintenance of the program is completely handed to a community, and a user can extract data through dataX or sqoop and the like.
However, the prior art is complex in implementation, low in security and high in cost, cannot realize cross-system data transmission, and cannot meet various data transmission requirements.
Disclosure of Invention
The application provides a data processing method, a data processing device and a storage medium, and solves the technical problems that in the prior art, the realization is complex, the safety is low, the cost is high, the cross-system data transmission cannot be realized, and various data transmission requirements cannot be met.
In a first aspect, the present application provides a data processing method, including:
acquiring rule configuration information, and determining a fetch instruction according to the rule configuration information, wherein the fetch instruction comprises fetch execution information, data source configuration information and transmission information configuration information;
performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and sending the target data file to a target system according to the transmission information configuration information.
The scheme can realize light-weight data synchronization and cross-system transmission, is simple in configuration, can perform data extraction and generate corresponding field information files and the like only by automatically determining the access instruction according to rule configuration information set by a user without excessively appointing field types and the like by the user, improves data processing safety compared with a complex general data synchronization scheme data model, is simple to realize, supports various data sources, is low in maintenance cost, and can meet the requirements of various cross-system data extraction and transmission.
Optionally, after the obtaining the rule configuration information and determining the fetching instruction according to the rule configuration information, the method further includes:
splitting the fetching instruction to obtain a plurality of fetching sub-instructions;
correspondingly, the performing data query according to the access execution information and the data source configuration information to obtain a target data file includes:
generating a plurality of parallel query subtasks according to the plurality of fetching subcommands and the data source configuration information;
and executing the plurality of parallel query subtasks in parallel, and performing data query in a target source database to obtain a target data file.
When the data is queried and extracted, the data query execution service can be started according to the concurrency number controlled by the concurrency service, the required data is queried from the database and written into the corresponding batch file, when the data with larger data volume is extracted, the data can be extracted in multiple batches, the concurrency number is set, the data can be simultaneously extracted in multiple batches, and the data processing efficiency is further improved.
Optionally, the performing, according to the access execution information and the data source configuration information, data query in a target source database to obtain a target data file includes:
generating a plurality of parallel query tasks according to a plurality of access execution information and data source configuration information corresponding to the access execution information;
and executing the multiple parallel query tasks in parallel, and performing data query in a target source database to obtain a target data file.
When the data is queried and extracted, the data query execution service can be started according to the concurrency number controlled by the concurrency service, the required data is queried from the database and written into the corresponding batch file, and the data can be simultaneously executed aiming at a plurality of different data fetching execution information, so that the data processing efficiency is further improved.
Optionally, the fetch execution information includes field information and field order of the synchronization data;
correspondingly, after the obtaining the rule configuration information and determining the fetching instruction according to the rule configuration information, the method further includes:
and generating a database mode definition language file corresponding to the synchronous data according to the field information.
Here, the present application may generate a corresponding database schema Definition Language (DDL) file according to the field information, so that the DDL file is provided for a downstream system to create a table of the database information.
Optionally, the performing, according to the access execution information and the data source configuration information, data query in a target source database to obtain a target data file includes:
and according to the access execution information and the data source configuration information, performing data query in a target source database to obtain a data synchronization description information control file and a data file.
Here, after data is extracted, a data synchronization description information control (Ctrl) file and a data (dat) file can be generated, and through the files, a function of data extraction is realized, a user can conveniently and accurately obtain specific information of the data file through the data synchronization description information Ctrl file in a data transmission process, the user can screen and extract data in a targeted manner, flexibility of data processing is improved, and user experience is improved.
Optionally, the sending the target data file to a target system according to the transmission information configuration information includes:
determining a target system according to the transmission information configuration information;
and sending the database mode definition language file, the target data file data synchronization description information control file and the data file to the target system.
In the method, the DDL file, the data synchronization description information Ctrl file and the data dat file generated by the data field information are simultaneously transmitted to the target system, so that a user can conveniently obtain detailed and comprehensive data, and the user experience is further improved.
Optionally, the access execution information includes a preset extraction time period;
correspondingly, the performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file includes:
and performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file in a preset extraction time period.
The method and the device support the flexible switching between the customized rule extraction data according to the specified date and the default rule extraction data when the special rule is not specified when the data are extracted, and meet the requirements that historical data can be extracted when a user extracts the data and the specified date supplements and deducts the past data.
In a second aspect, the present application provides a data processing apparatus comprising:
the rule management module is used for acquiring rule configuration information and determining a fetch instruction according to the rule configuration information, wherein the fetch instruction comprises fetch execution information, data source configuration information and transmission information configuration information;
the data synchronization module is used for performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and the data transmission module is used for transmitting the target data file to a target system according to the transmission information configuration information.
Optionally, after the rule management module obtains rule configuration information and determines the access instruction according to the rule configuration information, the apparatus further includes:
the splitting module is used for splitting the access instruction to obtain a plurality of access sub-instructions;
correspondingly, the data synchronization module is specifically configured to:
generating a plurality of parallel query subtasks according to the plurality of access sub-instructions and the data source configuration information;
and executing the plurality of parallel query subtasks in parallel, and performing data query in a target source database to obtain a target data file.
Optionally, the data synchronization module is further specifically configured to:
generating a plurality of parallel query tasks according to a plurality of access execution information and data source configuration information corresponding to the access execution information;
and executing the plurality of parallel query tasks in parallel, and performing data query in a target source database to obtain a target data file.
Optionally, the fetch execution information includes field information and field order of the synchronization data;
correspondingly, after the rule management module obtains the rule configuration information and determines the access instruction according to the rule configuration information, the apparatus further includes:
and the generating module is used for generating a database mode definition language file corresponding to the synchronous data according to the field information.
Optionally, the data synchronization module is further specifically configured to:
and according to the access execution information and the data source configuration information, performing data query in a target source database to obtain a data synchronization description information control file and a data file.
Optionally, the data transmission module is specifically configured to:
determining a target system according to the transmission information configuration information;
and sending the database mode definition language file, the target data file data synchronization description information control file and the data file to the target system.
Optionally, the access execution information includes a preset extraction time period;
correspondingly, the data synchronization module is further specifically configured to:
and performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file in a preset extraction time period.
In a third aspect, the present application provides a data processing apparatus comprising: at least one processor and memory;
the memory stores computer-executable instructions;
execution of the computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the data processing method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a data processing method as set forth in the first aspect above and in various possible designs of the first aspect.
In a fifth aspect, the present invention provides a computer program product comprising a computer program which, when executed by a processor, carries out the data processing method of the first aspect as well as various possible designs of the first aspect.
According to the data processing method, the data processing device, the data processing equipment and the storage medium, a user can extract data and generate corresponding field information files and the like only by automatically determining the access instruction according to rule configuration information set by the user without excessively appointing field types and the like by the user, and compared with a complex general data synchronization scheme data model, the data processing method improves data processing safety, is simple to implement, supports multiple data sources, is low in maintenance cost, and can meet the requirements of extraction and transmission of various cross-system data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic flow chart illustrating a rule management method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a data synchronization method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a data transmission method according to an embodiment of the present application;
fig. 6 is a schematic flowchart of another data processing method according to an embodiment of the present application;
fig. 7 is a schematic flowchart of third-party scheduling according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of a data extraction and transmission scheme according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Specific embodiments of the present disclosure have been shown by way of example in the drawings and will be described in more detail below. The drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," "third," and "fourth," etc., in the description and claims of this application and in the foregoing drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, the terms in the embodiments of the present application are explained:
extracting data: the system is used for describing that data is queried from a source system database according to certain rules.
Software Development Kit (SDK) scheduling: the method is used for describing a jar (software package file format) package which is provided for a user and the user can call a packaged method to extract and transmit data by introducing the system of the user.
And (3) script scheduling: the method is used for describing a set of shell (computer shell layer) scripts provided by the invention, and is used for manually or automatically scheduling shell commands by a user to perform data extraction and cross-system transmission.
And (3) task concurrency: the quantity of the tasks which are used for describing the data extraction and transmission in the invention and are simultaneously executed in unit time can be reasonably and concurrently set, so that the performance of the data extraction and transmission can be improved.
And (3) task initiation: the task of extracting data is executed manually or by field triggering.
Batch number: when the method is used for describing the extraction and transmission process of the data of a single database table, if the data volume is large, the data is processed in batches according to a certain rule, and a plurality of batches can be processed in parallel, so that the set batch number is obtained.
With the social development and the general application of informatization, various data applications such as statistical analysis and data mining based on data exist, a single type of data cannot meet analysis scenes, various types of information data need to be mixed and used in different scenes to dig out data value, and data cross-system transmission becomes a necessary requirement. In this process, a tool is required to support the extraction and transfer of data across the system. Under the background, various data synchronization tools come along, the existing various tools are universal, and in order to be suitable for various application scenes, the included functions are rich, but the structure is relatively complex, the learning cost is high, the maintenance difficulty is high, and the integration into a service system is inconvenient. At present, the most representative data synchronization schemes in various commonly used data synchronization schemes include dataX, sqoop and the like; the same kind of solutions in the current market are open source; the general data synchronization scheme in the market belongs to open source programs, the maintenance of the programs is completely handed to communities, the activity of related communities is not high, many open source programs are not maintained by people for a long time, the iteration of updating, optimizing, upgrading and the like of the programs is slow, the safety of the programs is not guaranteed, and scanning maintenance and the like are performed by nobody specially assigned to the safety loopholes of the programs; the general data synchronization scheme has complex data models, high learning and maintenance cost and inconvenient integration into a service system, a user can only extract data in the using process but cannot realize multi-channel transmission of data across the system, additional functions are needed to support data transmission,
in order to solve the above problems, embodiments of the present application provide a data processing method, an apparatus, a device, and a storage medium, where the method implements a scheme of lightweight data synchronization and cross-system transmission, is simple to use, convenient to configure, and easy to learn, supports multiple data sources, and can meet various requirements for cross-system data extraction and transmission.
In the technical scheme of the application, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the related user data and other information all accord with the regulations of related laws and regulations and do not violate the good customs of the public order.
Optionally, fig. 1 is a schematic diagram of an architecture of a data processing system according to an embodiment of the present application. In fig. 1, the architecture includes at least one of a data acquisition device 101, a processing device 102, and a display device 103.
It should be understood that the illustrated architecture of the embodiments of the present application does not constitute a specific limitation on the architecture of the data processing system. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
In a specific implementation process, the data acquisition device 101 may include an input/output interface and may also include a communication interface, and the data acquisition device 101 may be connected to the processing device through the input/output interface or the communication interface.
Processing device 102 may obtain rule configuration information and determine a fetch instruction according to the rule configuration information; performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file; and sending the target data file to a target system according to the transmission information configuration information.
The display device 103 may also be a touch display screen or a screen of a terminal device for receiving a user instruction while displaying the above-mentioned contents to enable interaction with a user.
It should be understood that the above processing device may be implemented by a processor reading instructions in a memory and executing the instructions, or may be implemented by a chip circuit.
In addition, the network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it can be known by a person of ordinary skill in the art that, along with the evolution of the network architecture and the occurrence of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The technical scheme of the application is described in detail by combining specific embodiments as follows:
optionally, fig. 2 is a schematic flow chart of a data processing method provided in the embodiment of the present application. The execution subject in the embodiment of the present application may be the processing device 102 in fig. 1, and the specific execution subject may be determined according to an actual application scenario. As shown in fig. 2, the method comprises the steps of:
s201: and acquiring rule configuration information, and determining the access instruction according to the rule configuration information.
The access instruction comprises access execution information, data source configuration information and transmission information configuration information.
Optionally, the fetch execution information includes field information and field order of the synchronization data; correspondingly, after the rule configuration information is obtained and the access instruction is determined according to the rule configuration information, the method further comprises the following steps: and generating a database mode definition language file corresponding to the synchronous data according to the field information.
Here, the embodiment of the present application may generate a corresponding database schema definition language DDL file according to the field information for a downstream system to create a table of database information.
The rule configuration information may be obtained through user input, or may be pre-stored in the system.
Alternatively, the following functions may be provided in step S201: providing a user rule configuration specification, a rule configuration entry, rule configuration persistence, analyzing configuration rule information, and caching the rule configuration information into a memory; providing an executable access SQL sentence which is analyzed according to the rule, and analyzing the field information and the field sequence of the synchronous data according to the rule; generating a corresponding DDL file according to the field information for a downstream system to create a table of database information; and providing the analyzed available data source configuration information according to the data source configuration rule, caching the available data source configuration information into a memory, and creating an available data source connection pool.
Exemplarily, fig. 3 is a schematic flow chart of a rule management method provided in the embodiment of the present application, and as shown in fig. 3, the embodiment of the present application may implement rule management and fetch instruction parsing through the processes shown in the figure.
In a possible implementation manner, the fetch execution information includes an SQL script configuration file, the data source configuration information includes a CFG data source configuration file, and the transmission information configuration information includes Properties configuration files.
The user provides rule configuration in a form of a custom file, and the scheme provides a Structured Query Language (SQL) script file to define an access execution script; providing data source configuration through the CFG configuration file, and encrypting the connection information of the configuration data source; and providing information configuration such as concurrency, data transmission nodes, data transmission paths, field types, file formats and the like through the Properties configuration file.
The main content of the SQL script configuration file (access execution information) is access execution SQL, which contains access field information, access database table, database schema, access condition restriction and other information.
CFG data source configuration file (data source configuration information) main contents: the database connection string, the database user, and the database encrypt the password string information.
The Properties profile (transmission information configuration information) mainly includes: the method comprises the steps of file export identification, main key identification, special field mapping, export exclusion column, incremental derivative timestamp, service initialization date, DDL file pushing date, data file sending node information and the like, wherein the file sending node supports multi-node configuration, and a plurality of target addresses are sent by the same data file.
When the program is called and started, the rule configuration module is started, and the database configuration file is read and analyzed to create database connection information. When a data extracting task is triggered, the configured access script file is analyzed to the memory, the query is executed through database connection, and paired DDL file information is generated through database metadata; meanwhile, the snapshot task can analyze the Properties configuration file to the memory, and file generation and related derivative parallelism setting are carried out according to the configuration information.
S202: and performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file.
The data source configuration information includes an identifier of the target source database.
Optionally, performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file, including: and according to the access execution information and the data source configuration information, performing data query in a target source database to obtain a data synchronization description information control file and a data file.
Here, after data is extracted, a data synchronization description information control Ctrl file and a data dat file may be generated, and through the files, a function of data extraction is realized, and it is convenient for a user to accurately obtain specific information of the data file through the data synchronization description information Ctrl file in a data transmission process, and the user may screen and extract data in a targeted manner, so that flexibility of data processing is improved, and user experience is improved.
Optionally, the access execution information includes a preset extraction time period; correspondingly, according to the access execution information and the data source configuration information, performing data query in the target source database to obtain a target data file, including: and performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file in a preset extraction time period.
The preset extraction time period may be determined according to an actual situation, and the preset extraction time period is not specifically limited in this embodiment of the present application.
The embodiment of the application supports flexible switching between the use of customized rules for data extraction according to the specified date and the use of default rules for data extraction when no special rule is specified during data extraction, and meets the requirements that historical data can be extracted when a user extracts data and the specified date can complement previous data.
Alternatively, step S202 may implement the following functions: data inquiry is carried out with the provided access instruction through the established data source connection pool session, and the inquired data is written into the corresponding data file in batch; creating a ctrl description file according to the number information of the query data, and recording the information such as the number of the synchronous data records; if the data volume is large, a plurality of batch data files can be written in parallel at the same time in batches.
Exemplarily, fig. 4 is a schematic flow chart of a data synchronization method provided in the embodiment of the present application, and as shown in fig. 4, the embodiment of the present application may implement data extraction through a process as shown in the figure.
In a possible implementation mode, after a task for extracting data is scheduled, the analyzed configuration in the memory is read, a database connection session is constructed, a data extraction SQL script is executed, a corresponding field DDL information file is generated according to returned metadata information, and the field type and the name are prepared to be extracted; and simultaneously, starting concurrent execution data query service according to the concurrency number controlled by the concurrent service, querying required data from the database, and writing the required data into the corresponding batch file. In the process, three types of files can be generated, the first data file records all the inquired data, and the data between each column is divided by a | @ | separator; the second file is a ctrl description file, which records the number of records of the derivative in the batch, the derivative time, the name of the generated data file, and other information; the third file is a DDL file, which is a query data field information file that records all fields and field types of derivatives, and field order. The three file content information are associated with each other, and the data extraction condition can be verified through the three files.
In a possible implementation manner, performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file includes: generating a plurality of parallel query tasks according to the plurality of access execution information and data source configuration information corresponding to the plurality of access execution information; and executing a plurality of parallel query tasks in parallel, and performing data query in the target source database to obtain a target data file.
When data is queried and extracted, the embodiment of the application can start the concurrent execution query data service according to the concurrency number controlled by the concurrent service, query the required data from the database, write the corresponding batch file, and execute the data simultaneously according to a plurality of different access execution information, thereby further improving the efficiency of data processing.
S203: and sending the target data file to a target system according to the transmission information configuration information.
Wherein the target system or the path for transmitting information to the target system can be determined by transmitting information configuration information.
Alternatively, step S203 may implement the following functions: and transmitting the generated data field information DDL file, the data synchronous description information Ctrl file and the data dat file to a corresponding system according to the configured sending mode and the sending information, and recording the sending log information.
Exemplarily, fig. 5 is a schematic flow chart of a data transmission method provided in the embodiment of the present application, and as shown in fig. 5, the embodiment of the present application may implement data cross-system transmission through a process as shown in the figure.
In a possible implementation manner, the user is given a data file transmission service, and after the user finishes extracting data to generate a file, the user automatically triggers the file transmission service to transmit the data file to a specified system. In the module, the selection of multiple sending channels is supported, a user can select a corresponding channel to send according to a downstream system or a personal receiving mode, the sending channel can comprise mails, faxes, NFT, SFTP and the like, different channels are selected to be configured with related channels, and a sending service reads the configuration in an internal memory to send. Synchronous transmission and asynchronous transmission are supported in a transmission mode, and the synchronous transmission is suitable for the situation that the data file is less than 5M; in the asynchronous transmission mode, a scene in which the trial file is larger than 5M (the above range is only schematic, and may be determined according to actual conditions in an actual application process) is used, and the asynchronous transmission mode also needs to support asynchronous acquisition of a transmission result for a user to determine the transmission result. In order to provide the user with the transmitted information query service, the transmitted log recording service is particularly provided, the transmitted log is recorded in the database, and the asynchronous transmitted result is also updated into the log table.
Optionally, the sending the target data file to the target system according to the transmission information configuration information includes: determining a target system according to the transmission information configuration information; and sending the database mode definition language file, the target data file data synchronization description information control file and the data file to a target system.
Here, in the embodiment of the present application, the DDL file, the data synchronization description information Ctrl file, and the data dat file generated by the data field information are simultaneously transmitted to the target system, which is convenient for a user to obtain detailed and comprehensive data, and further improves user experience. The embodiment of the application provides a scheme capable of realizing lightweight data synchronization and cross-system transmission, the configuration is simple, a user can extract data and generate corresponding field information files and the like only by automatically determining a fetch instruction according to rule configuration information set by the user, the field types do not need to be designated by too many users, compared with a complex general data synchronization scheme data model, the data processing safety is improved, the implementation is simple, multiple data sources are supported, the maintenance cost is low, and the requirements of various cross-system data extraction and transmission can be met.
Optionally, in this embodiment of the present application, the control on the concurrency number in the data synchronization process may be more flexible, and accordingly, fig. 6 is a schematic flow diagram of another data processing method provided in this embodiment of the present application, as shown in fig. 6, the method includes:
s601: and acquiring rule configuration information, and determining the access instruction according to the rule configuration information.
The access instruction comprises access execution information, data source configuration information and transmission information configuration information.
S602: and splitting the access instruction to obtain a plurality of access sub-instructions.
S603: generating a plurality of parallel query subtasks according to the plurality of access sub-instructions and the data source configuration information;
s604: and executing a plurality of parallel query subtasks in parallel, and performing data query in the target source database to obtain a target data file.
S605: and sending the target data file to a target system according to the transmission information configuration information.
Alternatively, the concurrency of the thread level can be distributed in the memory of a single machine, or the concurrency of the physical isolation level can be carried out on a plurality of machines, and a user can set different concurrency modes as required.
Optionally, multiple systems for transmitting the same data can be supported, the data can be transmitted in different modes, and a user can configure the derived data to send the multiple systems at one time without worrying about differentiation.
When data is inquired and extracted, the data inquiry service can be started and executed concurrently according to the concurrency number controlled by the concurrency service, the required data is inquired from the database and written into the corresponding batch file, when the data with larger data volume is extracted, the data can be extracted in multiple batches, the concurrency number is set, the concurrent data extraction of multiple batches is supported, and the data processing efficiency is further improved.
Performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and according to the transmission information configuration information, sending the target data file to a target system.
In a possible implementation manner, the embodiment of the application can also implement third-party scheduling, and provide functions such as application-integrated scheduling entry and script control executed by scheduling of a third-party batch scheduling tool.
Exemplarily, fig. 7 is a schematic flowchart of a third party scheduling provided in the embodiment of the present application, and as shown in fig. 7, the embodiment of the present application may implement the third party scheduling through a process as shown in the figure.
In a possible implementation manner, a user may integrate a third-party scheduling module in a business system of the user, and perform scheduling execution of the extracted data task through the capability of the third-party scheduling module. The module provides two scheduling modes, the first mode is an SDK mode, and a user introduces a jar packet into a program of the user and performs data extraction task initiation and retry operation in the SDK mode; the second type is a script scheduling mode, the module provides ready-made shell scripts, a user can initiate a snapshot task only by executing the shell scripts in a program of the user and transmitting parameters, and if the user needs to extract data again, the user can initiate a corresponding re-running task only by flexibly sending a re-running task command. The module carries out resource management on the scheduling tasks, reasonably controls the number of the task resources, and avoids system alarms such as excessive consumption of the resources by the tasks, excessive CPU and the like.
Optionally, fig. 8 is a schematic diagram of a framework of a data extraction and transmission scheme provided in an embodiment of the present application, and as shown in fig. 8, the framework includes a rule management module, a data synchronization module, a data transmission template, and a third-party scheduling module.
Optionally, the rule management module (analyzes in advance according to the rule and the instruction), the main functions of which include providing the rule configuration specification of the user, rule configuration entry, rule configuration persistence, and analyzing the configuration rule information, and caching the rule configuration information into the memory; providing an analyzed executable access SQL sentence according to a rule, and analyzing field information and a field sequence of synchronous data according to the rule; generating a corresponding DDL file according to the field information for a downstream system to create a table of database information; and providing the analyzed available data source configuration information according to the data source configuration rule, caching the available data source configuration information into a memory, and creating an available data source connection pool.
Optionally, a data synchronization module (query function): the module has the main functions of inquiring data with the provided access SQL through the established data source connection pool session and writing the inquired data into the corresponding data file in batch; creating a ctrl description file according to the number information of the query data, and recording the information such as the number of the synchronous data records; if the data volume is large, a plurality of batch data files can be written in parallel at the same time in batches.
Optionally, the data transfer template: the main function is to transmit the generated data field information DDL file, data synchronous description information Ctrl file and data dat file to the corresponding system according to the configured sending mode and sending information, and record the sending log information.
Optionally, the third-party scheduling module provides functions such as application-integrated scheduling entry and script control executed by scheduling of a third-party batch scheduling tool. The working principle of the data extraction and transmission scheme is shown in detail in fig. 8.
Fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 9, the apparatus according to the embodiment of the present application includes: a rule management module 901, a data synchronization module 902, and a data transmission module 903. The rule management module 901 may be a rule management module in the framework of fig. 8. The data synchronization module 902 may be a data synchronization module in the framework of fig. 8, and the data transmission module 903 may be a data transmission module in the framework of fig. 8. The data processing means here may be the processing means described above, the processor itself, or a chip or an integrated circuit implementing the functionality of the processor. It should be noted here that the division of the rule management module 901, the data synchronization module 902, and the data transmission module 903 is only a division of a logic function, and the two may be integrated or independent physically.
The rule management module is used for acquiring rule configuration information and determining a fetch instruction according to the rule configuration information, wherein the fetch instruction comprises fetch execution information, data source configuration information and transmission information configuration information;
the data synchronization module is used for performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and the data transmission module is used for transmitting the target data file to the target system according to the transmission information configuration information.
Optionally, after the rule management module obtains the rule configuration information and determines the access instruction according to the rule configuration information, the apparatus further includes:
the splitting module is used for splitting the access instruction to obtain a plurality of access sub-instructions;
correspondingly, the data synchronization module is specifically configured to:
generating a plurality of parallel query subtasks according to the plurality of access sub-instructions and the data source configuration information;
and executing a plurality of parallel query subtasks in parallel, and performing data query in the target source database to obtain a target data file.
Optionally, the data synchronization module is further specifically configured to:
generating a plurality of parallel query tasks according to the plurality of access execution information and data source configuration information corresponding to the plurality of access execution information;
and executing a plurality of parallel query tasks in parallel, and performing data query in the target source database to obtain a target data file.
Optionally, the fetch execution information includes field information and field order of the synchronization data;
correspondingly, after the rule management module obtains the rule configuration information and determines the access instruction according to the rule configuration information, the apparatus further includes:
and the generating module is used for generating a database mode definition language file corresponding to the synchronous data according to the field information.
Optionally, the data synchronization module is further specifically configured to:
and according to the access execution information and the data source configuration information, performing data query in a target source database to obtain a data synchronization description information control file and a data file.
Optionally, the data transmission module is specifically configured to:
determining a target system according to the transmission information configuration information;
and sending the database mode definition language file, the target data file data synchronization description information control file and the data file to a target system.
Optionally, the access execution information includes a preset extraction time period;
correspondingly, the data synchronization module is further specifically configured to:
and performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file in a preset extraction time period.
Referring to fig. 10, there is shown a schematic structural diagram of a data processing device 1000 suitable for implementing the embodiment of the present disclosure, where the data processing device 1000 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The data processing apparatus shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the data processing apparatus 1000 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 1001 which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the data processing apparatus 1000 are also stored. The processing device 1001, ROM1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication means 1009 may allow the data processing device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 10 illustrates a data processing apparatus 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the data processing apparatus; or may exist separately without being assembled into the data processing apparatus.
The computer readable medium carries one or more programs which, when executed by the data processing apparatus, cause the data processing apparatus to perform the method shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The data processing device in the embodiment of the present application may be configured to execute the technical solutions in the method embodiments of the present application, and the implementation principles and technical effects are similar, which are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is configured to implement the data processing method of any one of the foregoing items.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program is used to implement the data processing method of any one of the foregoing items.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A data processing method, comprising:
acquiring rule configuration information, and determining a fetch instruction according to the rule configuration information, wherein the fetch instruction comprises fetch execution information, data source configuration information and transmission information configuration information;
performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and sending the target data file to a target system according to the transmission information configuration information.
2. The method of claim 1, after obtaining rule configuration information and determining a fetch instruction according to the rule configuration information, further comprising:
splitting the access instruction to obtain a plurality of access sub-instructions;
correspondingly, the performing data query according to the access execution information and the data source configuration information to obtain a target data file includes:
generating a plurality of parallel query subtasks according to the plurality of access sub-instructions and the data source configuration information;
and executing the plurality of parallel query subtasks in parallel, and performing data query in a target source database to obtain a target data file.
3. The method according to claim 1, wherein the performing a data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file comprises:
generating a plurality of parallel query tasks according to a plurality of access execution information and data source configuration information corresponding to the access execution information;
and executing the plurality of parallel query tasks in parallel, and performing data query in a target source database to obtain a target data file.
4. The method according to any one of claims 1 to 3, wherein the fetch execution information comprises field information and field order of synchronization data;
correspondingly, after the obtaining the rule configuration information and determining the access instruction according to the rule configuration information, the method further includes:
and generating a database mode definition language file corresponding to the synchronous data according to the field information.
5. The method according to claim 4, wherein the performing a data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file comprises:
and according to the access execution information and the data source configuration information, performing data query in a target source database to obtain a data synchronization description information control file and a data file.
6. The method of claim 5, wherein sending the target data file to a target system according to the transmission information configuration information comprises:
determining a target system according to the transmission information configuration information;
and sending the database mode definition language file, the target data file data synchronization description information control file and the data file to the target system.
7. The method according to any one of claims 1 to 3, wherein the fetch execution information includes a preset extraction time period;
correspondingly, the performing data query in the target source database according to the access execution information and the data source configuration information to obtain a target data file includes:
and performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file in a preset extraction time period.
8. A data processing apparatus, comprising:
the rule management module is used for acquiring rule configuration information and determining a fetch instruction according to the rule configuration information, wherein the fetch instruction comprises fetch execution information, data source configuration information and transmission information configuration information;
the data synchronization module is used for performing data query in a target source database according to the access execution information and the data source configuration information to obtain a target data file;
and the data transmission module is used for transmitting the target data file to a target system according to the transmission information configuration information.
9. A data processing apparatus, characterized by comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the data processing method of any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 7 when executed by a processor.
CN202211372228.4A 2022-11-03 2022-11-03 Data processing method, device, equipment and storage medium Pending CN115640349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211372228.4A CN115640349A (en) 2022-11-03 2022-11-03 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211372228.4A CN115640349A (en) 2022-11-03 2022-11-03 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115640349A true CN115640349A (en) 2023-01-24

Family

ID=84946238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211372228.4A Pending CN115640349A (en) 2022-11-03 2022-11-03 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115640349A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737678A (en) * 2023-08-14 2023-09-12 浙江同信企业征信服务有限公司 Data synchronization method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737678A (en) * 2023-08-14 2023-09-12 浙江同信企业征信服务有限公司 Data synchronization method, device, equipment and storage medium
CN116737678B (en) * 2023-08-14 2024-04-19 浙江同信企业征信服务有限公司 Data synchronization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10048948B2 (en) Optimized retrieval of custom string resources
US10073899B2 (en) Efficient storage using automatic data translation
US11762931B2 (en) Feedback method and apparatus based on online document comment, and non-transitory computer-readable storage medium
CN104981768A (en) Cloud-based streaming data receiver and persister
US20180024848A1 (en) Translatable Texts Identification in In-Context Localization Utilizing Pseudo-Language and an External Server
US20240126417A1 (en) Method, form data processing method, apparatus, and electronic device for form generation
CN113448562B (en) Automatic logic code generation method and device and electronic equipment
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN111026400A (en) Method and device for analyzing service data stream
CN114117190A (en) Data processing method, data processing device, storage medium and electronic equipment
CN115640349A (en) Data processing method, device, equipment and storage medium
CN113282611A (en) Method and device for synchronizing stream data, computer equipment and storage medium
WO2022184077A1 (en) Document editing method and apparatus, and terminal and non-transitory storage medium
RU2679971C2 (en) Implementation of access to semantic content in development system
CN110647827A (en) Comment information processing method and device, electronic equipment and storage medium
US20230401377A1 (en) Document creation method and apparatus, and device and storage medium
CN110188366A (en) A kind of information processing method, device and storage medium
WO2023221795A1 (en) View generation method and apparatus, electronic device, and storage medium
US20230409814A1 (en) Document editing method and apparatus, device, and storage medium
CN111104450B (en) Target data importing method, medium, device and computing equipment
WO2023116469A1 (en) Information processing method and apparatus, terminal, and storage medium
EP4280565A1 (en) Sample message processing method and apparatus
EP3885955A2 (en) Method and apparatus for identifying user, storage medium, and electronic device
CN115344688A (en) Business data display method and device, electronic equipment and computer readable medium
CN112380476A (en) Information display method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination