Disclosure of Invention
In order to solve the above technical problems, the present application is to provide a data scheduling method, system, device and storage medium, which can quickly transfer data to a designated data warehouse.
The first technical scheme adopted by the application is as follows: a data scheduling method, comprising the steps of:
acquiring a database backup file and collecting the database backup file;
distributing the collected database backup files to each server based on the Netty framework and restoring the database backup files into original data;
the original data are transferred to a data warehouse;
further, the step of obtaining the database backup file and collecting the database backup file specifically includes:
acquiring database data and generating a data file catalog;
and acquiring the needed database backup files from the data file catalogue, classifying the data files in a gathering way, and checking the file format after the classification of the data files in the gathering way is finished, so that the database backup files after the gathering are successfully obtained.
Further, the step of distributing the collected database backup files to each server based on the Netty framework and restoring the database backup files to original data specifically includes:
distributing the database backup file to a server where a corresponding database is located based on the Netty framework;
calling a restore script to restore the database backup file and analyzing the restored data;
further, the step of transferring the original data to a data warehouse specifically includes:
reading a database based on the DataX, and acquiring original data in the database;
classifying the original data according to preset rules and transferring the classified data to a data warehouse;
further, the step of distributing the database backup file to the server where the corresponding database is located based on the Netty framework specifically includes:
establishing connection between a data center end and a distribution service center end;
and responding to a distribution request sent by the data center end, and distributing the database backup file to a corresponding server.
Further, the step of reading the database based on the DataX and obtaining the original data in the database specifically includes:
dividing the database into areas with different addresses according to the physical addresses of the database;
and creating a corresponding number of processing threads to scan each region and acquire data, and integrating to obtain the original data in the database.
Further, the response to the distribution request sent by the data center end adopts Protobuf serialization protocol to respond and send the request.
The second technical scheme adopted by the application is as follows: a data scheduling system, comprising:
the collection module is used for acquiring the database backup files and collecting the database backup files;
the distribution module is used for distributing the collected database backup files to each server based on the Netty framework and restoring the database backup files into original data;
and the transfer module is used for transferring the original data to the data warehouse.
The third technical scheme adopted by the application is as follows: a data scheduling apparatus comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the one data scheduling method as described above.
The fourth technical scheme adopted by the application is as follows: a storage medium having stored therein instructions executable by a processor, characterized by: the processor-executable instructions, when executed by the processor, are for implementing a data scheduling method as described above.
The method, the system, the device and the storage medium have the beneficial effects that: the unified scheduling of mass data is realized by collecting, distributing, restoring and restoring the database files, a data flow closed loop is formed, the speed of data acquisition and restoring is effectively improved, and the workload of data management staff is reduced.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
In some scenes needing to transfer data from a database and store the data, such as enterprises transferring the obtained data to an analysis platform, by using the method, the data transfer can be completed without a third-party tool, the data transfer speed is improved, the backup files are collected and classified through a data center end, the files are subjected to format verification, a complete data system is constructed, the data flow is effectively closed, and the workload of data staff is reduced.
As shown in fig. 1, the present application provides a data scheduling method, which includes the following steps:
s101, acquiring a database backup file and collecting the database backup file.
Specifically, the data information of each database can be manually added, the data information comprises structured data and unstructured data reported by each unit, and a user can visually add, delete, modify and review the files through a data center terminal.
S102, distributing the collected database backup files to each server based on the Netty framework and restoring the database backup files into original data;
s103, the original data are transferred to a data warehouse.
Specifically, data management is established for the restored original database, and in the step of the application, the data is transferred from the database to the data warehouse, partial transfer or full-library transfer can be realized through scripts, so that the workload is further lightened, and different types of data can be distributed to different servers through server management and the databases and servers are uniformly managed.
Further as a preferred embodiment of the method, the step of obtaining the database backup file and collecting the database backup file specifically includes:
acquiring database data and generating a data file catalog;
and acquiring the needed database backup files from the data file catalogue, classifying the data files in a gathering way, and checking the file format after the classification of the data files in the gathering way is finished, so that the database backup files after the gathering are successfully obtained.
Specifically, the visualized unified management of the data files is realized through the data file catalogue.
Further as a preferred embodiment of the method, the step of distributing the collected database backup files to each server and restoring the collected database backup files to original data based on the Netty framework specifically includes:
distributing the database backup file to a server where a corresponding database is located based on the Netty framework;
and calling a restore script to restore the database backup file and analyzing the restored data.
Specifically, netty is an asynchronous event-driven network application framework, and has the characteristics of high concurrency, high performance, high reliability and expansion, a two-layer thread model and NIO multiplexing non-blocking technology are adopted as a Netty framework bottom layer, and a handler chain in the framework can be expanded, a customizable serialization protocol and a network communication protocol.
Further as a preferred embodiment of the method, the step of transferring the raw data to a data warehouse specifically includes:
reading a database based on the DataX, and acquiring original data in the database;
classifying the original data according to preset rules and transferring the classified data to a data warehouse.
Specifically, the data X is an off-line synchronization tool for heterogeneous data sources, and aims to realize stable and efficient data synchronization functions among various heterogeneous data sources including a relational database (MySQL, oracle and the like).
Further as a preferred embodiment of the method, the step of distributing the database backup file to the server where the corresponding database is located based on the Netty framework specifically includes:
establishing connection between a data center end and a distribution server end;
and responding to a distribution request sent by the data center end, and distributing the database backup file to a corresponding server.
Specifically, the application comprises a data center end, a distribution service center end and a restoration service end in a server, wherein the data center end, the distribution service center end and the restoration service end in the server are respectively connected, the data center client end is used for managing the data of the whole system, the distribution service end is used for responding to a distribution request of the data center end to realize the data distribution to each server, the data center end sends a restoration request, and the restoration service end in the server responds to the request to restore the data file into original data.
Further as a preferred embodiment of the method, the step of reading the database based on DataX and obtaining the original data in the database specifically includes:
dividing the database into areas with different addresses according to the physical addresses of the database;
and creating a corresponding number of processing threads to scan each region and acquire data, and integrating to obtain the original data in the database.
Further as a preferred embodiment of the method, the response to the distribution request sent by the data center end adopts Protobuf serialization protocol to respond and send the request.
Specifically, the method for segmenting the database based on the physical address of the database, reducing the scanning range, extracting data at high speed, acquiring the original data of the database through multi-thread concurrent scanning, further improving the transfer speed, wherein Protobuf is a method for serializing structural data which is language independent, platform independent and extensible, can be used for (data) communication protocols and data storage, and can enable the serialization and deserialization speeds of a system to be higher by adopting the Protobuf serialization protocols.
Specific embodiments of the application are as follows:
the method comprises the steps of obtaining database backup files reported from each unit, integrating all database backup files to generate a data file catalog, classifying the collection of files required by a user and checking file formats, distributing the required database backup files to corresponding databases based on a Netty framework after the verification is successful, establishing connection between a data center end and a distribution server, sending a distribution request by the user through the data center end, responding the distribution request by the distribution server end, distributing the files to the corresponding servers, sending a restoration request by the data center end, calling a restoration script in the servers to restore the data files to obtain original data, and finally transferring the original data to the data warehouse through the dataX, wherein the user can realize monitoring and management of the whole data scheduling process through the data center end.
As shown in fig. 2, a data scheduling system includes:
the collection module is used for acquiring the database backup files and collecting the database backup files;
the distribution module is used for distributing the collected database backup files to each server based on the Netty framework and restoring the database backup files into original data;
and the transfer module is used for transferring the original data to the data warehouse.
Further as a preferred embodiment of the system, the collecting module further includes:
the catalog submodule is used for acquiring database data and generating a data file catalog;
and the classifying sub-module is used for acquiring the needed database backup files from the data file catalogue, classifying the collection, and checking the file format after the classification of the collection is finished.
Further as a preferred embodiment of the present system, the distribution module further comprises:
the server sub-module is used for distributing the database backup files to the servers where the corresponding databases are located based on the Netty framework;
the atom returning module is used for calling a restoring script to restore the database backup file and analyzing the restored data;
further as a preferred embodiment of the system, the transfer module further includes:
the reading sub-module is used for reading the database based on the DataX and acquiring the original data in the database;
and the classification sub-module is used for classifying the original data according to preset rules and transferring the classified data to a data warehouse.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
An authentication data scheduling apparatus:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement a data scheduling method as described above.
The content in the method embodiment is applicable to the embodiment of the device, and the functions specifically realized by the embodiment of the device are the same as those of the method embodiment, and the obtained beneficial effects are the same as those of the method embodiment.
A storage medium having stored therein instructions executable by a processor, characterized by: the processor-executable instructions, when executed by the processor, are for implementing a data scheduling method as described above.
The content in the method embodiment is applicable to the storage medium embodiment, and functions specifically implemented by the storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.