WO2021238902A1 - 数据导入方法、装置、服务平台及存储介质 - Google Patents

数据导入方法、装置、服务平台及存储介质 Download PDF

Info

Publication number
WO2021238902A1
WO2021238902A1 PCT/CN2021/095730 CN2021095730W WO2021238902A1 WO 2021238902 A1 WO2021238902 A1 WO 2021238902A1 CN 2021095730 W CN2021095730 W CN 2021095730W WO 2021238902 A1 WO2021238902 A1 WO 2021238902A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
imported
data file
import
file
Prior art date
Application number
PCT/CN2021/095730
Other languages
English (en)
French (fr)
Inventor
王晋花
朱柯见
卢家顺
刘志文
付裕
吕达
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2021238902A1 publication Critical patent/WO2021238902A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Definitions

  • This application relates to the field of data storage technology, for example, to a data import method, device, service platform, and storage medium.
  • the development of information business has brought about an increase in the amount of data.
  • the database plays an indispensable role as a data bridge in the information business system.
  • Distributed database is a logically unified database composed of multiple physically scattered database units connected by a computer network. It has the characteristics of large storage capacity, high business concurrency, and good scalability.
  • the application of distributed databases is increasing. widely. In the application scenarios of distributed databases, data backup, recovery, and migration are common operations, which requires the database to provide a complete and reliable data import function.
  • the import function of the database is realized by the way of business data insertion, that is, the insert statement queue is executed on the database agent node, and the database agent node completes the import of the business data.
  • This import technology is mature but has low performance. For example, in the case of a large amount of data import, it will cause greater pressure on the database agent node, and it will take a long time and low efficiency.
  • This application provides a data import method, device, service platform, and storage medium to improve import efficiency when importing big data files into a database.
  • a data import method includes: determining a target storage node of a data file to be imported; splitting the data file to be imported, and importing a preset number of data subfiles obtained by the split into the target storage node until the The data file to be imported is split and the import is completed.
  • a data importing device including: a node determining module, configured to determine a target storage node of a data file to be imported; a first importing module, configured to split the data file to be imported, and split the obtained preset A number of data sub-files are imported into the target storage node until the to-be-imported data file is split and the import is completed.
  • a data import service platform including: one or more processors; a memory configured to store one or more programs; when the one or more programs are executed by the one or more processors, The one or more processors are allowed to implement the data import method described above.
  • a storage medium which stores a computer program, and when the computer program is executed by a processor, the above-mentioned data import method is realized.
  • FIG. 1 is a flowchart of a data import method provided by an embodiment of this application
  • FIG. 3 is a schematic diagram of a process of splitting and importing data files based on a producer consumer model provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of a data slicing process based on a producer-consumer model provided by an embodiment of this application;
  • Figure 5 is a structural diagram of a distributed database concurrent import system provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a data import process in an application scenario provided by an embodiment of the application.
  • FIG. 7 is a structural diagram of a data migration system between heterogeneous databases provided by an embodiment of the application.
  • FIG. 8 is a structural diagram of a data import device provided by an embodiment of the application.
  • Fig. 9 is a structural diagram of a data import service platform provided by an embodiment of the application.
  • Fig. 1 is a flowchart of a data import method provided by an embodiment of the application. This embodiment is applicable to the case of importing data files provided by an external business system into a database, for example, importing a large amount of data files into a distributed database.
  • the method can be executed by a data import device, which can be implemented in software and/or hardware, and can be integrated into a data import service platform, where the data import service platform can be an intelligent terminal or server with processing functions.
  • the method includes the following steps.
  • the data file to be imported can be provided by an external business system, and the data file of the distributed database system needs to be imported by the data import service platform.
  • the type of the data file to be imported is not limited in this embodiment. Generally, the amount of data in the data file to be imported And the number is larger.
  • the external business system can pre-store the data files that need to be imported into the data through File Transfer Protocol (FTP) or other file transfer protocols such as Secure Shell File Transfer Protocol (SFTP) Import the designated location of the service platform.
  • FTP File Transfer Protocol
  • SFTP Secure Shell File Transfer Protocol
  • the target storage node may be a database storing data files or a storage location in the database.
  • the number of target storage nodes may be one or more, and there may be one or more target storage nodes corresponding to the same data file to be imported.
  • the data file to be imported and the target storage node can be determined according to the data import request sent by the user through the user terminal.
  • the data import request may include the Internet Protocol (IP) address corresponding to the data file to be imported and the identification information of the database to be imported.
  • IP Internet Protocol
  • the IP address corresponding to the data file to be imported is the data file to be imported in the data import service.
  • the storage location of the platform, the data file to be imported can be obtained according to the IP address.
  • the database identification information is used to uniquely identify the database storing the data file, for example, it may be a number corresponding to the database, and each database corresponds to a unique number.
  • the user terminal can be a smart terminal such as a mobile phone, a notebook computer, or a tablet computer.
  • the data file to be imported and the target storage node can be determined according to the data import request sent by the external business system, and the determination process is similar to the previous one.
  • S120 Split the data file to be imported, and import a preset number of data sub-files obtained by the split into the target storage node, until the data file to be imported is split and imported.
  • this embodiment splits the data file to be imported.
  • the data files to be imported can be split one by one to obtain multiple data sub-files.
  • the data files to be imported can be split in batches to obtain multiple data sub-files.
  • Batch splitting is to split multiple data files to be imported at the same time.
  • this embodiment takes batch splitting of data files to be imported as an example. The number of batch splits can be set by yourself.
  • the data file to be imported can be split into multiple data sub-files according to the size of the data file to be imported.
  • the size of the data file to be imported is 100M
  • the data file to be imported can be split into N data Sub-files.
  • the size of these N data sub-files can be the same or different.
  • the size of N can be fixed or dynamically adjusted according to the size of the data file to be imported. For example, when the amount of data in the data file to be imported is large, N is larger , When the data volume of the data file to be imported is small, N is small.
  • the data file to be imported can be split into multiple data sub-files according to the number of rows of the data file to be imported.
  • the data file to be imported can be split every 5000 rows in the order from front to back. It is a data sub-file.
  • the number of rows of the data subfile can be preset. When the remaining number of rows in the data file to be imported is less than the preset number of rows, the remaining number of rows can be directly used as a data subfile.
  • each split data sub-file can be numbered according to the integrity of the content.
  • the data file to be imported can be numbered according to the previous After the order is divided into 5 data sub-files, these 5 data sub-files can be numbered as A-1, A-2, A-3, A-4 and A-5 respectively, and A is used for unique identification For the data file to be imported, splice A-1, A-2, A-3, A-4 and A-5 to get a complete data file.
  • the method of importing data sub-files can be selected according to the actual situation.
  • the import process can be performed after all the data files to be imported are split, or the import process can be accompanied during the splitting process.
  • this embodiment uses the latter
  • the data file storage is realized by splitting and importing concurrently.
  • When importing data sub-files it can be imported one by one or in batches.
  • This embodiment takes batch import as an example.
  • the preset number of data sub-files are imported into the target storage node at the same time. Compared with importing one by one, batch import can reduce the number of import and save time. This embodiment does not limit the preset number.
  • it can be set to 100, that is, every time the number of data sub-files reaches 100, the 100 data sub-files are imported into the target storage node.
  • the number of remaining data sub-files is less than the preset number, directly import the target storage node.
  • the embodiment of the present application provides a data import method by determining the target storage node of the data file to be imported; splitting the data file to be imported to obtain a preset number of data sub-files and importing it into the target storage node until the The data file to be imported is split and the import is complete.
  • this method when the number of data subfiles obtained by splitting reaches the preset number, the preset number of data subfiles are imported into the target storage node, so that splitting and importing are performed concurrently, and the importing efficiency of data files is improved.
  • FIG. 2 is a flowchart of another data import method provided by an embodiment of the application.
  • the data import request can be sent by the user through the user terminal, which can be a smart terminal such as a mobile phone, a notebook computer, or a tablet computer.
  • the data import request in this embodiment includes address information, library information, and table information as examples.
  • the address information is used to identify the storage location of the data file to be imported, and different storage locations correspond to different address information, and the data file to be imported can be obtained according to the address information.
  • the library information may be the name, number, and other information of the library storing the data subfile.
  • the table information may be information such as the name and number of the table storing the data subfile.
  • the table is located in the aforementioned library.
  • a library may contain multiple tables, and different tables have unique identification information. In other words, the data subfile needs to be imported into the specified table of the specified library.
  • search the data import service platform to obtain the data file to be imported.
  • the target database is a database that contains the database corresponding to the above-mentioned database information and the table corresponding to the table information.
  • the database in this embodiment may contain multiple databases.
  • the libraries contained in the database can be called sub-databases, that is to say The same database can contain multiple sub-databases, and each sub-database can contain multiple tables to store data sub-files.
  • Different databases can also be located in a database cluster.
  • the database cluster can be a cluster containing multiple databases, or a collection containing multiple database clusters. The embodiment does not limit the number of databases or database clusters contained in the database cluster. . Traverse multiple database clusters to obtain the database containing the above-mentioned sub-database information and table information, that is, to obtain the target database.
  • the target database can also be calculated according to the column fields of the data file to be imported, and the embodiment does not limit the calculation process.
  • the structure of the data file to be imported is consistent with the structure of the table used to store the data file in the target database to ensure that the data subfile is imported successfully. For this reason, this embodiment first checks the data file to be imported before splitting the data file to be imported, and avoids directly splitting the data file to be imported. After performing the import operation, it is found that the structure of the data sub-file and the target database are used to store the data file. If the structure of the table is inconsistent and cannot be imported, the import efficiency can be improved.
  • the data file to be imported can be verified in the following manner:
  • the table information determine the structure information corresponding to the table information in the target storage node; according to the structure information, verify the data rows of the data file to be imported; if each of the data files to be imported If the column field information of the data row is consistent with the structure information, the verification is successful; otherwise, the verification fails.
  • the structure information corresponding to the table information may include information such as the name of the column of the table and the type corresponding to each column. Only when the column name and type of each column of the data to be imported in the data file to be imported are consistent with the column name and type of each column of the corresponding table in the target database, the data file to be imported can be successfully imported into the corresponding table. In one case, each row of the data file to be imported can be read, and the structure information of the corresponding table in the target database can be used to verify whether the column field corresponding to the row is correct, until the end of the row. In this way, you can check whether the row data is wrong.
  • S250 Determine whether the verification of the data file to be imported is successful, if the verification is successful, execute S260, otherwise, execute S280.
  • S260 Split the to-be-imported data file according to the parameter information of the split file in the configuration file to obtain a data subfile.
  • the parameter information of the split file is the basis for splitting the data file to be imported. For example, it can be the number of lines in the split file or the size of the split file. In this embodiment, the number of lines in the split file is taken as an example. Each N lines of the file are split into a data subfile. When the number of lines is less than N, the subfile corresponding to the current number of lines can be directly used as a data subfile.
  • the import efficiency is improved by adopting a batch import method, and when the import operation is performed, the splitting operation continues, so that the splitting and importing are performed concurrently, and the importing efficiency is improved.
  • the embodiment does not limit the size of the preset number, for example, it can be set to 100, that is, when the number of data sub-files reaches 100, the data sub-files are imported into the target storage node.
  • the error file cache is used to store the data files that have failed the verification.
  • the entire data file to be imported that failed the verification can be imported into the error file cache, or only the data lines that have failed the verification in the data file to be imported can be imported To the error file cache. Importing the data that fails the verification into the error file cache can ensure that the data is not lost and ensure the accuracy of importing large data files into the database.
  • the data file import result can include the verification result and the import result.
  • the verification result can include the data row that failed the verification and the reason for the verification failure.
  • the import result can include the identification information of the successfully imported data subfile and the failed data subfile. The identification information of the file and the data subfile that failed to import.
  • the import fails due to one reason when the import operation is performed.
  • the data subfile can also be retried. The number of retries can be Set it yourself. Feeding back the results of the aggregated data file import to the user terminal can make the user clearly understand the import situation of multiple data files.
  • the embodiment of the application provides a data import method. Based on the above embodiment, the data file to be imported is verified first. After the verification is successful, the concurrent operation of splitting and importing is performed, which improves the import efficiency and improves the verification Failed data files are stored in the error file cache to ensure that the data is not lost, and the accuracy of importing large data files into the database is ensured.
  • the splitting and importing of data files is the core of the data importing service platform.
  • the following describes the splitting and importing process of data files in combination with a producer-consumer model.
  • Figure 3 is a schematic diagram of a process of splitting and importing data files based on a producer consumer model provided by an embodiment of the application.
  • the coordination thread creates a producer thread and a consumer thread according to the received data import request.
  • the consumer thread is used as the sending thread of the data import service platform to send split files.
  • the sending thread waits for the condition variable notification that is allowed to be sent. It can also be called blocking the condition variable.
  • the condition variable is the preset number of data subfiles described in the above-mentioned embodiment.
  • the sending thread checks whether it receives the notification. A preset number of data sub-files are generated. If a preset number of data sub-files are generated, the preset number of data sub-files are sent; otherwise, it continues to block and wait for notification.
  • the producer thread is used as the split thread of the data import service platform to split the data file to be imported.
  • the splitting process is as follows: Get the data file to be imported and treat it according to the table metadata cache
  • the order analysis of each column field of the imported data file for example, first read the data file to be imported, obtain a column field in a row, use metadata to verify whether the column field is correct, and analyze each column field in this order until the end of the row.
  • the table metadata cache is a storage device for storing table metadata.
  • the metadata is data that matches the structure information of the table described in the foregoing embodiment, and the structure of the metadata is the structure of the table corresponding to the target database.
  • the metadata can be used to verify whether the column fields of the data file to be imported are correct.
  • the splitting thread splits the data file to be imported according to the number of lines of the split file configured in advance.
  • the sending thread is notified to send the split file, and the splitting thread continues to split .
  • the split thread and the sending thread execute concurrently, which improves the performance of data import.
  • the sending thread sends the split data subfile and continues to check whether there is any data subfile that needs to be sent, and continues to block and wait for the notification after the repeated sending is over.
  • the coordination thread determines that the splitting of the to-be-imported data file is complete, it sets the split end tag, and notifies the sending thread and the splitting thread to exit.
  • the sending thread receives the notification, it sends the split file first, and then checks whether the splitting thread is finished. If the splitting thread splits, it sends an internal event to the coordinating thread to indicate the end of the splitting phase.
  • the above-mentioned split thread and sending thread adopt the producer-consumer model notified by condition variables, and realize the decoupling concurrency of batch data distribution and data import.
  • the model also supports the data slicing function, which can split a large data file according to the number of split rows to form multiple small files, making the data import function more flexible and suitable for more application scenarios, such as Expansion of a storage node in the database, access to only specified shard data for imported files, etc.
  • the process is shown in Figure 4.
  • the coordination thread starts the split thread according to the received split command and returns. After the split thread is started, it enters the loop split mode, and a notification is sent every time a batch is split. Although there is no sending thread waiting for the notification here, the design does not affect the split to continue. After the splitting thread ends, the file splitting end flag is set, and an internal event is sent to the coordination thread to indicate the end of the splitting phase.
  • the structure of the distributed database concurrent import system includes a distributed database platform 1 and a data import service platform 2.
  • the distributed database platform 1 is the entity and core of the distributed database, responsible for the management and monitoring of data storage and system status.
  • the distributed database platform 1 includes a storage node 11, a storage node management and monitoring module 12, a metadata service module 13, and a database cluster management module 14.
  • the metadata service module 13 is responsible for providing metadata information of database tables to the data import service platform 2 and is responsible for providing authentication services. All metadata information in the distributed database system is saved and managed by the metadata service module 13 to provide other modules with metadata information required by other modules.
  • the storage node monitoring and management module 12 is responsible for real-time monitoring of the running status and statistical information of the corresponding storage node 11.
  • the storage node 11 is configured to store data.
  • three storage node management and monitoring modules 12 and three storage nodes 11 are taken as examples. Each storage node management monitoring module 12 can monitor multiple storage nodes 11.
  • the data import service platform 2 provides bulk data import services from external business systems to the distributed database platform 1.
  • the data import service platform 2 includes a file processing module 21, a file distribution module 22, and a status statistics module 23.
  • the processing module 21 and the file distribution module 22 are the basis for the implementation of the producer-consumer model using condition variable notification.
  • the file processing module 21 is configured to provide a data exchange interface with an external business system, receive a data import request sent by a user through a user terminal, and split the data file to be imported, and so on.
  • the file distribution module 22 is configured to notify the storage node management and monitoring module 12 to issue files.
  • FIG. 6 is a schematic diagram of a data import process in an application scenario provided by an embodiment of the application.
  • the external business system stores the large amount of data files to be imported to the designated location of the data import service platform through FTP or other file transfer protocols.
  • Figure 6 takes the data import request sent through an external business system as an example.
  • the data import service platform After receiving the data import request, the data import service platform sends a metadata acquisition request to the metadata service module based on the cluster number, library name, and table name in the data import request Information, the table structure information corresponding to the table name is obtained, and the data file to be imported is verified according to the table structure information; and the target storage node is determined according to information such as the cluster number, library name, and table name.
  • the data file to be imported is split according to the number of rows of the split file configured in advance.
  • a data import request is sent to the database cluster management module, and the database cluster management module imports the data
  • the request is forwarded to the storage node management and monitoring module.
  • the storage node management and monitoring module After receiving the data import request, the storage node management and monitoring module connects to the database node to download and import the data subfile, and finally imports the data subfile into the specified table of the database.
  • the storage node management monitoring module executes the download of the data sub-files and after the import is complete, it reports the download status statistics and returns the data import results to the database cluster management module.
  • the database cluster management module forwards the data import results to the data import service platform after receiving the data import results.
  • the data import service platform summarizes the data file import results of multiple nodes and feeds them back to the external business system.
  • the data import method provided by the embodiments of the present application is not only applicable to distributed database systems that need to import data frequently, but also applicable to scenarios where only data slices are used to expand the capacity of one storage node in the database, and it has universality and flexibility.
  • the data migration system between heterogeneous databases shown in FIG. 7 takes an oracle database 31, a migration system 32, and a mysql database 33 as an example to migrate data files from the oracle database 31 to the mysql database 33.
  • a data migration task is initiated through the management interface 321 of the migration system 32, and the information collection module 322 collects the data information to be migrated from the oracle database 31, including the start and end time of the migration, the meta table structure of the migration data, etc., and saves the relevant information to the storage module 323 in.
  • the analysis module 324 analyzes and verifies the information collected by the information collection module 322, converts the migration data element table structure into a grammatical structure supported by the mysql database through a conversion tool, and saves it in the storage module 323.
  • the migration module 325 executes the data migration from the oracle database 31 to the mysql database 33
  • the import process is the same as the data import method provided in the above-mentioned embodiment.
  • the data file is imported into the mysql database 33 to complete the data migration task. Compared with the method of importing a single file, the concurrent import of multiple small files realized by data slicing has significantly improved performance.
  • FIG. 8 is a structural diagram of a data importing device provided by an embodiment of the application.
  • the device can execute the data importing method provided by the above-mentioned embodiment.
  • the device includes the following modules.
  • the node determining module 41 is configured to determine the target storage node of the data file to be imported;
  • the first import module 42 is configured to split the data file to be imported, and import a preset number of data sub-files obtained by the split into the target storage node, until the data file to be imported is split and imported.
  • the data import device determines the target storage node of the data file to be imported; splits the data file to be imported to obtain a preset number of data sub-files and import it into the target storage node until the target storage node is imported.
  • the imported data file is split and the import is complete.
  • the device imports the preset number of data subfiles into the target storage node when the number of data subfiles obtained by splitting reaches the preset number, so that splitting and importing are performed concurrently, and the importing efficiency of data files is improved.
  • the device further includes:
  • the verification module is configured to verify the data file to be imported before splitting the data file to be imported.
  • the node determining module 41 is set to:
  • a target database is determined, and the target database is recorded as the target storage node of the data file to be imported.
  • the first import module 42 is set to:
  • the preset number of data sub-files are imported into the target storage node until the data file to be imported is split and the import is completed.
  • the verification module is set to:
  • the device further includes:
  • the second import module is set to import the data file to be imported that failed the verification into the error file cache if the verification fails.
  • the device further includes:
  • the summary module is configured to summarize the data file import results after the data file to be imported is split and the import is completed, and feed back to the corresponding user terminal.
  • the data import device provided in the embodiment of the present application can execute the data import method in the foregoing embodiment, and has the corresponding functional modules and effects for the execution method.
  • Fig. 9 is a structural diagram of a data import service platform provided by an embodiment of the application.
  • the data import service platform includes a processor 51, a memory 52, an input device 53, and an output device 54.
  • the number of processors 51 in the data import service platform may be one or more. 51 as an example.
  • the processor 51 and the memory 52, the input device 53, and the output device 54 may be connected by a bus or other means. In FIG. 9, the connection by a bus is taken as an example.
  • the memory 52 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the data import method in the embodiment of the present application.
  • the processor 51 executes multiple functional applications and data processing of the data import service platform by running the software programs, instructions, and modules stored in the memory 52, that is, realizes the data import method of the foregoing embodiment.
  • the memory 52 includes a program storage area and a data storage area.
  • the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data created according to the use of the terminal, and the like.
  • the memory 52 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 52 may include a memory remotely provided with respect to the processor 51, and these remote memories may be connected to the data import service platform through a network. Examples of the aforementioned networks include the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 53 may be configured to receive inputted numeric or character information, and generate key signal inputs related to user settings and function control of the data import service platform.
  • the output device 54 may include display devices such as a display screen, speakers, and audio devices such as a buzzer.
  • the data import service platform provided by the embodiment of this application belongs to the same concept as the data import method provided in the above embodiment.
  • the embodiment of the present application further provides a storage medium storing a computer program, and when the computer program is executed by a processor, the data import method as described in the foregoing embodiment of the present application is implemented.
  • An embodiment of the present application provides a storage medium containing computer-executable instructions.
  • the computer-executable instructions are not limited to the operations in the data import method described above, and can also execute the data import method provided in any embodiment of the present application.
  • this application can be implemented by software and necessary general-purpose hardware, or can be implemented by hardware.
  • the technical solution of this application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), random access memory ( Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which can be a robot, a personal computer, a server, or a network device, etc.) execute the above-mentioned embodiments of this application The data import method described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据导入方法、装置、服务平台及存储介质。该数据导入方法包括:确定待导入数据文件的目标存储节点;拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。

Description

数据导入方法、装置、服务平台及存储介质
本申请要求在2020年05月25日提交中国专利局、申请号为202010448776.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据存储技术领域,例如涉及一种数据导入方法、装置、服务平台及存储介质。
背景技术
信息业务的发展带来了数据量的与日俱增,数据库在信息业务系统中承担着不可或缺的数据桥梁作用。分布式数据库是用计算机网络将物理上分散的多个数据库单元连接起来组成的一个逻辑上统一的数据库,有着存储量大、业务并发量高以及可扩展性好的特点,分布式数据库的应用日益广泛。在分布式数据库的应用场景中,数据的备份、恢复和迁移等是常见操作,这就要求数据库提供完备可靠的数据导入功能。
数据库的导入功能是通过业务数据插入的方式实现,也就是在数据库代理节点执行插入语句队列,由数据库代理节点完成业务数据的导入。这种导入技术成熟但性能低,例如在大数据量导入的情况下会对数据库代理节点造成较大的压力,而且耗时较长,效率低。
发明内容
本申请提供一种数据导入方法、装置、服务平台及存储介质,在大数据文件导入数据库时提升导入效率。
提供了一种数据导入方法,包括:确定待导入数据文件的目标存储节点;拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
还提供了一种数据导入装置,包括:节点确定模块,设置为确定待导入数据文件的目标存储节点;第一导入模块,设置为拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
还提供了一种数据导入服务平台,包括:一个或多个处理器;存储器,设置为存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执 行时,使得所述一个或多个处理器实现如上述的数据导入方法。
还提供了一种存储介质,存储有计算机程序,该计算机程序被处理器执行时实现上述的数据导入方法。
附图说明
图1为本申请实施例提供的一种数据导入方法的流程图;
图2为本申请实施例提供的另一种数据导入方法的流程图;
图3为本申请实施例提供的一种基于生产者消费者模型的数据文件拆分与导入的过程示意图;
图4为本申请实施例提供的一种基于生产者消费者模型的数据切片过程的示意图;
图5为本申请实施例提供的一种分布式数据库并发导入系统的结构图;
图6为本申请实施例提供的一种在应用场景一下的数据导入过程示意图;
图7为本申请实施例提供的一种异构数据库之间的数据迁移系统的结构图;
图8为本申请实施例提供的一种数据导入装置的结构图;
图9为本申请实施例提供的一种数据导入服务平台的结构图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。此外,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
图1为本申请实施例提供的一种数据导入方法的流程图,本实施例可适用于将外部业务系统提供的数据文件导入数据库的情况,例如将大数据量的数据文件导入分布式数据库,该方法可以由数据导入装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成在数据导入服务平台中,其中,数据导入服务平台可以是具备处理功能的智能终端或服务器。参考图1,该方法包括如下步骤。
S110、确定待导入数据文件的目标存储节点。
待导入数据文件可以是外部业务系统提供,需要由数据导入服务平台导入分布式数据库系统的数据文件,本实施例待导入数据文件的类型不进行限定,一般情况下,待导入数据文件的数据量和数量较大。在一种情况下,外部业务系统可以通过文件传输协议(File Transfer Protocol,FTP)或其他文件传输协议如安全文件传输协议(Secure Shell File Transfer Protocol,SFTP)将需要导入的数据文件预先存放至数据导入服务平台的指定位置。目标存储节点可以是存储数据文件的数据库或数据库中的一个存储位置。目标存储节点的数量可以是一个或多个,同一个待导入数据文件对应的目标存储节点可以是一个或多个。
在一种情况下,待导入数据文件和目标存储节点可以根据用户通过用户端发送的数据导入请求确定。数据导入请求可以包括待导入数据文件对应的网际互连协议(Internet Protocol,IP)地址以及待导入的数据库标识信息等,其中,待导入数据文件对应的IP地址为待导入数据文件在数据导入服务平台的存放位置,根据IP地址可以获取待导入的数据文件。数据库标识信息用于唯一标识存放数据文件的数据库,例如可以是数据库对应的编号,每一个数据库对应一个唯一的编号。用户端可以是手机、笔记本电脑或平板电脑等智能终端。在另一种情况下,待导入数据文件和目标存储节点可以根据外部业务系统发送的数据导入请求确定,确定过程与前面类似。
S120、拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
信息业务的发展带来了数据量的与日俱增,也因此加大了数据文件的数据量,在将数据文件导入数据库时对系统造成了较大的压力。为了降低导入压力,本实施例对待导入的数据文件进行拆分。在一种情况下,可以逐个拆分待导入数据文件,得到多个数据子文件。在另一种情况下,可以批量拆分待导入数据文件,得到多个数据子文件。批量拆分是同时拆分多个待导入数据文件,为了提高效率,本实施例以批量拆分待导入数据文件为例。批量拆分的数量可以自行设定。
在一种情况下,可以根据待导入数据文件的大小将待导入数据文件拆分为多个数据子文件,例如待导入数据文件的大小为100M,可以将待导入数据文件拆分为N个数据子文件,这N个数据子文件的大小可以相同也可以不同,N的大小可以固定,也可以根据待导入数据文件的大小动态调整,例如待导入数据文件的数据量较大时,N较大,待导入数据文件的数据量较小时,N较小。在另一种情况下,可以根据待导入数据文件的行数将待导入数据文件拆分为多个数据子文件,例如可以将待导入数据文件按照由前到后的顺序,每5000行拆分 为一个数据子文件。数据子文件的行数可以预先设置。当待导入数据文件剩余的行数小于预先设置的行数时,可以直接将剩余的行数作为一个数据子文件。在有数据导出需求时,为了保证导出数据文件的完整性,在拆分数据文件时,可以对拆分的每一个数据子文件按照内容的完整性进行编号,例如将待导入数据文件按照由前到后的顺序拆分为5个数据子文件,则可以将这5个数据子文件分别编号为A-1、A-2、A-3、A-4和A-5,A用于唯一标识待导入的数据文件,拼接A-1、A-2、A-3、A-4和A-5即可得到完整的数据文件。
数据子文件的导入方式可以根据实际情况选择,例如可以在待导入数据文件全部拆分完毕后再执行导入过程,也可以在拆分的过程中伴随导入,为了提高导入效率,本实施例以后者为例,即采用拆分与导入并发的方式实现数据文件的存储。在导入数据子文件时,可以逐个导入也可以批量导入,本实施例以批量导入为例,当数据子文件的数量达到预设数量时,将预设数量的数据子文件同时导入目标存储节点,与逐个导入的方式相比,批量导入可以减少导入次数,节省时间。本实施例对预设数量不进行限定,例如可以设置为100,即数据子文件的数量每达到100,便将这100个数据子文件导入目标存储节点。当剩余数据子文件的数量小于预设数量时,直接导入目标存储节点。
本申请实施例提供一种数据导入方法,通过确定待导入数据文件的目标存储节点;拆分所述待导入数据文件,得到预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。该方法在拆分得到的数据子文件的数量达到预设数量时将预设数量的数据子文件导入目标存储节点,使得拆分与导入并发执行,提高了数据文件的导入效率。
图2为本申请实施例提供的另一种数据导入方法的流程图。
S210、获取数据导入请求,所述数据导入请求包括地址信息、库信息和表信息。
数据导入请求可以由用户通过用户端发送,用户端可以是手机、笔记本电脑或平板电脑等智能终端。本实施例的数据导入请求以包括地址信息、库信息和表信息为例。地址信息用于标识待导入数据文件的存放位置,不同的存放位置对应不同的地址信息,根据地址信息可以获取待导入的数据文件。库信息可以是存储数据子文件的库的名称、编号等信息。表信息可以是存储数据子文件的表的名称、编号等信息,该表位于前面所述的库中,一个库中可以包含多个表,不同的表具有唯一的标识信息。也就是说数据子文件最终需要导入到指定库的指定表中。
S220、根据所述地址信息,获取所述待导入数据文件。
根据地址信息,查找数据导入服务平台,即可获取待导入的数据文件。
S230、根据所述库信息和表信息,确定目标数据库,并将所述目标数据库记为所述待导入数据文件的目标存储节点。
目标数据库为包含上述库信息对应的库以及表信息对应的表的数据库,本实施例的数据库中可以包含多个库,为了便于描述,可以将数据库中包含的库称为子数据库,也就是说同一个数据库中可以包含多个子数据库,每一个子数据库中可以包含多个表,以存储数据子文件。不同的数据库还可以位于一个数据库集群中,数据库集群可以是包含多个数据库的集群,也可以是包含多个数据库集群的集合,实施例对数据库集群所包含的数据库或数据库集群的数量不进行限定。遍历多个数据库集群,即可得到包含上述子数据库信息和表信息的数据库,也即得到了目标数据库。当包含子数据库信息和表信息的数据库为多个时,还可以根据待导入数据文件的列字段计算目标数据库,实施例对计算过程不进行限定。
S240、校验所述待导入数据文件。
通常情况下,待导入数据文件的结构与目标数据库中用于存储数据文件的表的结构一致才可以保证数据子文件导入成功。为此本实施例在拆分待导入数据文件之前先对待导入数据文件进行校验,避免直接拆分待导入数据文件,执行导入操作后发现数据子文件的结构与目标数据库中用于存储数据文件的表的结构不一致而无法导入的情况,提高导入效率。
在一种情况下可以通过如下方式校验所述待导入数据文件:
根据所述表信息,确定所述目标存储节点中所述表信息对应的结构信息;根据所述结构信息,校验所述待导入数据文件的数据行;如果所述待导入数据文件中每个数据行的列字段信息与所述结构信息一致,则校验成功;否则,校验失败。
表信息对应的结构信息可以包括表的列名和每一列对应的类型等信息。只有当待导入数据文件中待导入数据每一列的列名和类型与目标数据库中所对应表的每一列的列名和类型一致时,才可以将待导入数据文件成功导入对应的表中。在一种情况下可以读取待导入数据文件的每一行,利用目标数据库中所对应表的结构信息校验该行所对应的列字段是否正确,直至该行结束。这种方式可以检查行数据是否错误。
在另一种情况下还可以检查列数据是否错误,过程与检查行数据类似。
S250、判断所述待导入数据文件是否校验成功,如果校验成功,执行S260,否则,执行S280。
S260、根据配置文件中拆分文件的参数信息拆分所述待导入数据文件,得到数据子文件。
如果校验成功,执行拆分操作,拆分待导入数据文件。拆分文件的参数信息是拆分待导入数据文件的依据,例如可以是拆分文件的行数或拆分文件的大小,本实施例以拆分文件的行数为例,即将待导入数据文件的每N行拆分为一个数据子文件,当行数不足N时,可以直接将当前行数对应的子文件作为一个数据子文件。
S270、当所述数据子文件的数量达到预设数量时,将预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
本实施例在数据子文件的数量达到预设数量时,采用批量导入的方式提高了导入效率,并且在执行导入操作时,拆分操作继续,使得拆分与导入并发执行,提高了导入效率。实施例对预设数量的大小不进行限定,例如可以设置为100,即数据子文件的数量达到100时,将数据子文件导入目标存储节点。
S280、将校验失败的待导入数据文件导入错误文件缓存。
错误文件缓存用于存储校验失败的数据文件,在一种情况下可以将校验失败的整个待导入数据文件导入错误文件缓存,也可以仅将待导入数据文件中校验失败的数据行导入至错误文件缓存。将校验失败的数据导入错误文件缓存,可以保证数据不丢失,确保大数据量文件导入数据库的准确性。
S290、汇总数据文件导入结果,并反馈给对应的用户端。
数据文件导入结果可以包括校验结果以及导入结果,校验结果可以包括校验失败的数据行以及校验失败的原因,导入结果可以包括导入成功的数据子文件的标识信息、导入失败的数据子文件的标识信息以及导入失败的数据子文件。在一实施例中,即使前期校验成功,在执行导入操作时也会因为一种原因导致导入失败,在首次出现导入失败时,还可以对该数据子文件进行导入重试,重试次数可自行设置。将汇总后的数据文件导入结果反馈给用户端可以使用户清楚的了解多个数据文件的导入情况。
本申请实施例提供一种数据导入方法,在上述实施例的基础上先校验待导入数据文件,校验成功后,再执行拆分和导入的并发操作,提高了导入效率,并将校验失败的数据文件存储至错误文件缓存,保证了数据不丢失,确保了大数据量文件导入数据库的准确性。
数据文件的拆分与导入是数据导入服务平台的核心,下面结合一个生产者消费者模型对数据文件的拆分与导入过程进行描述。
图3为本申请实施例提供的一种基于生产者消费者模型的数据文件拆分与 导入的过程示意图。
协调线程根据接收的数据导入请求创建生产者线程和消费者线程,消费者线程作为数据导入服务平台的发送线程,用于发送拆分文件。发送线程通过协调线程启动后,等待允许发送的条件变量通知,也可以称为阻塞于条件变量,条件变量即上述实施例所述的数据子文件的预设数量,发送线程收到通知后检查是否有预设数量的数据子文件产生,若有预设数量的数据子文件产生则发送预设数量的数据子文件,否则继续阻塞等待通知。生产者线程作为数据导入服务平台的拆分线程,用于拆分待导入数据文件,拆分线程启动后进入循环拆分模式,拆分过程如下:获取待导入数据文件,按照表元数据缓存对待导入数据文件的每个列字段顺序分析,例如先读取待导入数据文件,获取一行中的一个列字段,利用元数据校验该列字段是否正确,如此顺序分析每一个列字段直至一行结束。其中,表元数据缓存为用于存储表的元数据的存储装置,元数据即与上述实施例所述的表的结构信息匹配的数据,元数据的结构即为目标数据库所对应表的结构,利用元数据可以校验待导入数据文件的列字段是否正确。校验之后,拆分线程根据预先配置的拆分文件的行数拆分待导入数据文件,当拆分文件的数量达到预设数量时,通知发送线程发送拆分文件,拆分线程继续拆分,拆分线程与发送线程并发执行,提高了数据导入性能。发送线程收到通知后,将拆分好的数据子文件发送出去并继续检查是否有需要发送的数据子文件,重复发送结束后继续阻塞等待发送通知。协调线程确定待导入数据文件拆分结束后,设置拆分结束标签,并通知发送线程和拆分线程退出。发送线程收到通知后,先发送已拆分文件,再检查拆分线程是否拆分结束,若拆分线程拆分结束则向协调线程发送内部事件以指示拆分阶段结束。
上述拆分线程和发送线程采用了以条件变量通知的生产者消费者模型,实现了按批次数据分发与数据导入的解耦并发。该模型同时支持数据切片功能,即可将一个大数据文件按照拆分行数进行拆分,形成多个小文件,使数据导入功能具有更好的灵活性以适用于更多的应用场景,如数据库中一个存储节点扩容、对导入文件只获取指定分片数据等。过程如图4所示。
协调线程根据接收的拆分命令启动拆分线程后返回。拆分线程启动后进入循环拆分模式,每拆分一个批次就发送一次通知,此处虽然无发送线程在等待通知,但该设计并不影响拆分继续进行。拆分线程结束后,设置文件拆分结束标志,并向协调线程发送内部事件以指示拆分阶段结束。
下面通过两个应用场景对数据导入过程进行描述。
应用场景一:分布式数据库并发导入
如图5所示,分布式数据库并发导入系统的结构包括分布式数据库平台1 和数据导入服务平台2。其中,分布式数据库平台1是分布式数据库的实体与核心,负责数据存储与系统状态的管理和监控。分布式数据库平台1包含存储节点11、存储节点管理监控模块12、元数据服务模块13以及数据库集群管理模块14。元数据服务模块13负责向数据导入服务平台2提供数据库表的元数据信息,并负责提供鉴权服务。分布式数据库系统中所有元数据信息都由元数据服务模块13保存和管理,为其他模块提供其他模块所需要的元数据信息。存储节点监控管理模块12负责实时监控对应存储节点11的运行状态与统计信息。在数据导入过程中,负责连接数据库节点,执行数据导入服务平台2下发的导入命令,同时为存储节点11提供服务响应、文件收发、和状态反馈等服务。存储节点11设置为存储数据。图5中以三个存储节点管理监控模块12和三个存储节点11为例。每一个存储节点管理监控模块12可以监控多个存储节点11。
数据导入服务平台2提供从外部业务系统到分布式数据库平台1的批量数据导入服务,如图5所示,数据导入服务平台2包括文件处理模块21、文件分发模块22和状态统计模块23,文件处理模块21和文件分发模块22是使用条件变量通知的生产者消费者模型实施的基础。文件处理模块21设置为提供与外部业务系统的数据交换接口、接收用户通过用户端发送的数据导入请求以及对待导入数据文件进行拆分等。文件分发模块22设置为通知存储节点管理监控模块12,进行文件下发。
参考图6,图6为本申请实施例提供的一种在应用场景一下的数据导入过程示意图。外部业务系统将大数据量的待导入数据文件通过FTP或其他文件传输协议存放至数据导入服务平台的指定位置。图6以通过外部业务系统发送数据导入请求为例,数据导入服务平台收到数据导入请求后,根据数据导入请求中的集群号、库名和表名等信息向元数据服务模块发送元数据获取请求信息,获取表名对应的表结构信息,并根据表结构信息校验待导入数据文件;以及根据集群号、库名和表名等信息确定目标存储节点。之后进入拆分过程,根据预先配置的拆分文件的行数拆分待导入数据文件,当拆分数量达到预设数量时,向数据库集群管理模块发送数据导入请求,数据库集群管理模块将数据导入请求转发给存储节点管理监控模块,存储节点管理监控模块收到数据导入请求后连接数据库节点执行数据子文件的下载、导入,最终将数据子文件导入至数据库的指定表中。存储节点管理监控模块执行数据子文件的下载、导入结束后,将下载状态统计上报,并向数据库集群管理模块返回数据导入结果,数据库集群管理模块收到数据导入结果后转发给数据导入服务平台,数据导入服务平台汇总多个节点的数据文件导入结果,并反馈给外部业务系统。
应用场景二:异构数据库之间的数据迁移
本申请实施例提供的数据导入方法不仅可以适用于需要频繁导入数据的分布式数据库系统,还可以适用于仅使用数据切片进行数据库中一个存储节点扩容的场景,具有普适性和灵活性。
现代化企业运营中,随着企业规模的扩大和业务的升级,原有的数据中心的架构体系、系统性能及存储容量逐渐不能满足业务的需要,因此需要进行企业信息系统的改造和升级,而数据迁移是其中非常重要的环节。图7所示的异构数据库之间的数据迁移系统以包括oracle数据库31、迁移系统32和mysql数据库33为例,将数据文件从oracle数据库31迁移到mysql数据库33。
通过迁移系统32的管理界面321发起一个数据迁移任务,信息采集模块322向oracle数据库31采集需要迁移的数据信息,包括迁移起止时间、迁移数据的元表结构等,并将相关信息保存到存储模块323中。分析模块324对信息采集模块322采集的信息进行分析校验,通过转换工具将迁移数据元表结构转化成mysql数据库支持的语法结构,并保存到存储模块323中。迁移模块325执行从oracle数据库31到mysql数据库33的数据迁移时,导入过程与上述实施例提供的数据导入方法原理相同,即通过生产者消费者模型并发执行数据文件的拆分与发送,快速将数据文件导入mysql数据库33,完成数据迁移任务。通过数据切片实现的多个小文件并发导入的方式相比单一文件导入的方式,性能显著提升。
图8为本申请实施例提供的一种数据导入装置的结构图,该装置可以执行上述实施例提供的数据导入方法,参考图8,该装置包括以下模块。
节点确定模块41,设置为确定待导入数据文件的目标存储节点;
第一导入模块42,设置为拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
本申请实施例提供的数据导入装置,通过确定待导入数据文件的目标存储节点;拆分所述待导入数据文件,得到预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。该装置在拆分得到的数据子文件的数量达到预设数量时将预设数量的数据子文件导入目标存储节点,使得拆分与导入并发执行,提高了数据文件的导入效率。
在上述实施例的基础上,该装置还包括:
校验模块,设置为在拆分所述待导入数据文件之前,校验所述待导入数据文件。
在上述实施例的基础上,节点确定模块41,是设置为:
获取数据导入请求,所述数据导入请求包括地址信息、库信息和表信息;
根据所述地址信息,获取所述待导入数据文件;
根据所述库信息和表信息,确定目标数据库,并将所述目标数据库记为所述待导入数据文件的目标存储节点。
在上述实施例的基础上,第一导入模块42,是设置为:
如果校验成功,根据配置文件中拆分文件的参数信息拆分所述待导入数据文件,得到数据子文件;
当所述数据子文件的数量达到预设数量时,将预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
在上述实施例的基础上,所述校验模块,是设置为:
根据所述表信息,确定所述目标存储节点中所述表信息对应的结构信息;
根据所述结构信息,校验所述待导入数据文件的数据行;
如果所述待导入数据文件中每个数据行的列字段信息与所述结构信息一致,则校验成功;否则,校验失败。
在上述实施例的基础上,该装置还包括:
第二导入模块,设置为如果校验失败,将校验失败的待导入数据文件导入错误文件缓存。
在上述实施例的基础上,该装置还包括:
汇总模块,设置为在将所述待导入数据文件拆分且导入完毕之后,汇总数据文件导入结果,并反馈给对应的用户端。
本申请实施例提供的数据导入装置可执行上述实施例中的数据导入方法,具备执行方法相应的功能模块和效果。
图9为本申请实施例提供的一种数据导入服务平台的结构图。
参考图9,该数据导入服务平台包括处理器51、存储器52、输入装置53和输出装置54,该数据导入服务平台中处理器51的数量可以是一个或多个,图9中以一个处理器51为例。处理器51与存储器52、输入装置53和输出装置54可以通过总线或其他方式连接,图9中以通过总线连接为例。
存储器52作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中数据导入方法对应的程序指令/模块。处理器51通过运行存储在存储器52中的软件程序、指令以及模块,从而执行数据导入服务平台的多种功能应用以及数据处理,即实现上述实施例的数据导入 方法。
存储器52包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器52可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器52可包括相对于处理器51远程设置的存储器,这些远程存储器可以通过网络连接至数据导入服务平台。上述网络的实例包括互联网、企业内部网、局域网、移动通信网及其组合。
输入装置53可设置为接收输入的数字或字符信息,以及产生与数据导入服务平台的用户设置以及功能控制有关的键信号输入。输出装置54可包括显示屏等显示设备、扬声器以及蜂鸣器等音频设备。
本申请实施例提供的数据导入服务平台与上述实施例提供的数据导入方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例具备执行数据导入方法相同的效果。
本申请实施例还提供一种存储介质,存储有计算机程序,该计算机程序被处理器执行时实现如本申请上述实施例所述的数据导入方法。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的数据导入方法中的操作,还可以执行本申请任意实施例所提供的数据导入方法中的相关操作,且具备相应的功能和效果。
通过以上关于实施方式的描述,本申请可借助软件及必需的通用硬件来实现,也可以通过硬件实现。本申请的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是机器人,个人计算机,服务器,或者网络设备等)执行本申请上述实施例所述的数据导入方法。

Claims (10)

  1. 一种数据导入方法,包括:
    确定待导入数据文件的目标存储节点;
    拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
  2. 根据权利要求1所述的方法,其中,在所述拆分所述待导入数据文件之前,还包括:
    校验所述待导入数据文件。
  3. 根据权利要求2所述的方法,其中,所述确定待导入数据文件的目标存储节点,包括:
    获取数据导入请求,其中,所述数据导入请求包括地址信息、库信息和表信息;
    根据所述地址信息,获取所述待导入数据文件;
    根据所述库信息和所述表信息,确定目标数据库,并将所述目标数据库记为所述待导入数据文件的目标存储节点。
  4. 根据权利要求2所述的方法,其中,所述拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕,包括:
    在校验所述待导入数据文件成功的情况下,根据配置文件中拆分文件的参数信息拆分所述待导入数据文件,得到数据子文件;
    在所述数据子文件的数量达到所述预设数量的情况下,将所述预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
  5. 根据权利要求3所述的方法,其中,所述校验所述待导入数据文件,包括:
    根据所述表信息,确定所述目标存储节点中所述表信息对应的结构信息;
    根据所述结构信息,校验所述待导入数据文件的数据行;
    在所述待导入数据文件中每个数据行的列字段信息与所述结构信息一致的情况下,校验所述待导入数据文件成功;在所述待导入数据文件中一个数据行的列字段信息与所述结构信息不一致的情况下,校验所述待导入数据文件失败。
  6. 根据权利要求4所述的方法,还包括:
    在校验所述待导入数据文件失败的情况下,将校验失败的待导入数据文件导入错误文件缓存。
  7. 根据权利要求1-6中任一项所述的方法,其中,在将所述待导入数据文件拆分且导入完毕之后,还包括:
    汇总数据文件导入结果,并将所述汇总数据文件导入结果反馈给对应的用户端。
  8. 一种数据导入装置,包括:
    节点确定模块,设置为确定待导入数据文件的目标存储节点;
    第一导入模块,设置为拆分所述待导入数据文件,将拆分得到的预设数量的数据子文件导入所述目标存储节点,直至将所述待导入数据文件拆分且导入完毕。
  9. 一种数据导入服务平台,包括:
    一个或多个处理器;
    存储器,设置为存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-7中任一项所述的数据导入方法。
  10. 一种存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述的数据导入方法。
PCT/CN2021/095730 2020-05-25 2021-05-25 数据导入方法、装置、服务平台及存储介质 WO2021238902A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010448776.5 2020-05-25
CN202010448776.5A CN113722277A (zh) 2020-05-25 2020-05-25 一种数据导入方法、装置、服务平台及存储介质

Publications (1)

Publication Number Publication Date
WO2021238902A1 true WO2021238902A1 (zh) 2021-12-02

Family

ID=78671522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095730 WO2021238902A1 (zh) 2020-05-25 2021-05-25 数据导入方法、装置、服务平台及存储介质

Country Status (2)

Country Link
CN (1) CN113722277A (zh)
WO (1) WO2021238902A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401289A (zh) * 2021-12-31 2022-04-26 深圳市麦谷科技有限公司 任务分批上传方法及系统
CN114428815A (zh) * 2022-01-17 2022-05-03 多点生活(成都)科技有限公司 数据存储方法、装置、电子设备和计算机可读介质
CN114741231A (zh) * 2022-04-19 2022-07-12 深圳鲲云信息科技有限公司 基于存储器的数据读写方法、装置、设备及存储介质
CN116521092A (zh) * 2023-06-30 2023-08-01 昆山工业大数据创新中心有限公司 一种工业设备数据的存储方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912609A (zh) * 2016-04-06 2016-08-31 中国农业银行股份有限公司 一种数据文件处理方法和装置
CN106156209A (zh) * 2015-04-23 2016-11-23 中兴通讯股份有限公司 数据处理方法及装置
CN108984757A (zh) * 2018-07-18 2018-12-11 上海汉得信息技术股份有限公司 一种数据导入方法及设备
CN112597219A (zh) * 2020-12-15 2021-04-02 中国建设银行股份有限公司 用于将大数据量的文本文件导入分布式数据库的方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169419A (zh) * 2011-04-02 2011-08-31 无锡众志和达存储技术有限公司 基于sata控制器的raid数据块拆分、组装方法
CN107357885B (zh) * 2017-06-30 2020-11-20 北京奇虎科技有限公司 数据写入方法及装置、电子设备、计算机存储介质
CN108376171B (zh) * 2018-02-27 2020-04-03 平安科技(深圳)有限公司 大数据快速导入的方法、装置、终端设备及存储介质
CN109635017A (zh) * 2018-10-16 2019-04-16 深圳壹账通智能科技有限公司 业务数据导入方法、装置、设备及计算机可读存储介质
CN110191182B (zh) * 2019-05-30 2023-04-21 深圳前海微众银行股份有限公司 分布式文件批处理方法、装置、设备与可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156209A (zh) * 2015-04-23 2016-11-23 中兴通讯股份有限公司 数据处理方法及装置
CN105912609A (zh) * 2016-04-06 2016-08-31 中国农业银行股份有限公司 一种数据文件处理方法和装置
CN108984757A (zh) * 2018-07-18 2018-12-11 上海汉得信息技术股份有限公司 一种数据导入方法及设备
CN112597219A (zh) * 2020-12-15 2021-04-02 中国建设银行股份有限公司 用于将大数据量的文本文件导入分布式数据库的方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401289A (zh) * 2021-12-31 2022-04-26 深圳市麦谷科技有限公司 任务分批上传方法及系统
CN114428815A (zh) * 2022-01-17 2022-05-03 多点生活(成都)科技有限公司 数据存储方法、装置、电子设备和计算机可读介质
CN114741231A (zh) * 2022-04-19 2022-07-12 深圳鲲云信息科技有限公司 基于存储器的数据读写方法、装置、设备及存储介质
CN116521092A (zh) * 2023-06-30 2023-08-01 昆山工业大数据创新中心有限公司 一种工业设备数据的存储方法和装置
CN116521092B (zh) * 2023-06-30 2023-09-05 昆山工业大数据创新中心有限公司 一种工业设备数据的存储方法和装置

Also Published As

Publication number Publication date
CN113722277A (zh) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2021238902A1 (zh) 数据导入方法、装置、服务平台及存储介质
CN110069572B (zh) 基于大数据平台的hive任务调度方法、装置、设备及存储介质
US11928089B2 (en) Data processing method and device for distributed database, storage medium, and electronic device
US20160034582A1 (en) Computing device and method for executing database operation command
WO2021259217A1 (zh) 数据的关联查询方法、装置、设备及存储介质
CN110674213A (zh) 一种数据同步方法及装置
CN111177161A (zh) 数据处理方法、装置、计算设备和存储介质
CN113157411B (zh) 一种基于Celery的可靠可配置任务系统及装置
WO2016169237A1 (zh) 数据处理方法及装置
CN108268468B (zh) 一种大数据的分析方法及系统
CN110765195A (zh) 一种数据解析方法、装置、存储介质及电子设备
WO2021031583A1 (zh) 执行语句的方法、装置、服务器及存储介质
CN112148206A (zh) 一种数据读写方法、装置、电子设备及介质
WO2021109777A1 (zh) 一种数据文件的导入方法及装置
US11625503B2 (en) Data integrity procedure
CN113111036A (zh) 一种基于hdfs的小文件处理方法、装置、介质及电子设备
CN111090782A (zh) 一种图数据存储方法、装置、设备及存储介质
CN114528049A (zh) 一种基于InfluxDB实现API调用信息统计的方法及系统
CN108664503A (zh) 一种数据归档方法及装置
CN113268483A (zh) 请求处理方法和装置、电子设备和存储介质
CN114020446A (zh) 一种跨多引擎的路由处理方法、装置、设备及存储介质
CN112925807A (zh) 面向数据库的请求的批处理方法、装置、设备及存储介质
CN113032477A (zh) 基于gtid的长距离数据同步方法、装置及计算设备
CN117827979B (zh) 一种数据批量导入方法、装置、电子设备及存储介质
CN117390040B (zh) 基于实时宽表的业务请求处理方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21812298

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 110423)

122 Ep: pct application non-entry in european phase

Ref document number: 21812298

Country of ref document: EP

Kind code of ref document: A1