WO2016169237A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2016169237A1
WO2016169237A1 PCT/CN2015/092759 CN2015092759W WO2016169237A1 WO 2016169237 A1 WO2016169237 A1 WO 2016169237A1 CN 2015092759 W CN2015092759 W CN 2015092759W WO 2016169237 A1 WO2016169237 A1 WO 2016169237A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
split
import
processing
module
Prior art date
Application number
PCT/CN2015/092759
Other languages
French (fr)
Chinese (zh)
Inventor
韩烨
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016169237A1 publication Critical patent/WO2016169237A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Definitions

  • the present invention relates to the field of communications, and in particular to a data processing method and apparatus.
  • Database technology is the core of various information systems such as management information systems, office automation systems, and decision support systems. Part of it is an important technical means for scientific research and decision management.
  • the present invention provides a data processing method and apparatus to solve at least the problem of low data import efficiency existing in the related art.
  • a data processing method includes: receiving a data import instruction for instructing data to be imported into a database; and splitting the data according to the data import instruction; The data chunks are imported into different storage spaces in the database.
  • the splitting the data according to the data importing instruction comprises: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; and perform the data according to each data row field in the identified data.
  • Split processing comprises: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; and perform the data according to each data row field in the identified data.
  • performing the splitting process on the data according to the data importing instruction includes: determining whether the data satisfies a splitting rule; and if the determining result is yes, performing the splitting process on the data; If the result is no, the data is subjected to correction processing; and the corrected data is subjected to split processing.
  • importing the split data into the different storage spaces in the database includes: downloading the split processed data; and importing the downloaded split data into the storage In the different storage spaces in the database.
  • the method further includes: deleting the downloaded data after the split processing.
  • the method further includes: summarizing the import result after the split processed data is imported; and feeding back the import result.
  • a data processing apparatus comprising: a receiving module configured to receive a data import instruction for instructing data to be imported into a database; and a processing module configured to: according to the data import instruction The data is split and processed; the import module is configured to import the split processed data into different storage spaces in the database.
  • the processing module includes: a determining unit, configured to determine a table structure of the table according to the data importing instruction and data distribution information of the data on the table; and an identifying unit configured to be according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; the first processing unit is configured to each data according to the identified data The row field splits the data.
  • the processing module includes: a determining unit, configured to determine whether the data satisfies a splitting rule; and a second processing unit configured to: when the determining result of the determining unit is yes, the data is Performing a splitting process; the correcting unit is configured to perform a correction process on the data when the determination result of the determination unit is negative; and the third processing unit is configured to perform a split process on the data after the correction process.
  • the importing module includes: a downloading unit configured to download the split processed data; and an importing unit configured to import the downloaded split processed data into blocks into different storages in the database In space.
  • the device further includes: a deleting module, configured to delete the downloaded split processed data.
  • a deleting module configured to delete the downloaded split processed data.
  • the device further includes: a summary module, configured to summarize the import result after the split processing is performed, and the feedback module is configured to feed back the import result.
  • a summary module configured to summarize the import result after the split processing is performed
  • the feedback module is configured to feed back the import result.
  • a data import instruction for instructing data to be imported into a database is received; the data is split according to the data import instruction; and the split data block is imported into the database.
  • the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.
  • FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram showing a second structure of the processing module 24 in the data processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of an import module 26 in a data processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a block diagram showing a second preferred structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a block diagram showing the structure of an import system according to an embodiment of the present invention.
  • FIG. 9 is a flow chart of data import processing in accordance with an embodiment of the present invention.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 receiving a data import instruction for instructing to import data into the database
  • Step S104 performing splitting processing on the data according to the data importing instruction
  • step S106 the split data block is imported into different storage spaces in the database.
  • the above-mentioned database can be called a distributed database system, in which the database system is free from the dependence of large equipment by constructing a high-availability and high-expansion cluster by using ordinary inexpensive equipment.
  • a good distributed database architecture can be easily accessed for high availability and can scale out.
  • the import and export function of large data volume is a key technology in distributed databases.
  • the data may be split according to the table used for splitting, wherein the data is imported according to the data import instruction.
  • Performing the splitting process includes: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; and identifying data according to the table structure, the data distribution information, and the descriptor information of the data carried in the data importing instruction.
  • Each data row field; the data is split according to each data row field in the identified data.
  • the legality of the data import instruction may be first determined, and then the table structure information and the distribution policy information of the imported destination library table are obtained, and then the data file is read, according to the table structure information and the distribution strategy.
  • Splitting the imported data file into multiple underlying databases ie, the above-mentioned storage space) storing a plurality of corresponding small files and transmitting them to the underlying database of each destination
  • the cluster management module issues instructions to each of the underlying databases to perform import of the corresponding files.
  • the splitting the data according to the data importing instruction includes: determining whether the data satisfies the splitting rule; and if the determining result is yes, splitting the data; If not, the above data is corrected; the corrected data is split.
  • there may be multiple correction methods which may be performed by an administrator, that is, artificially; or, by means of a module that performs split processing, other modules may be acquired without manual intervention.
  • Correction is performed according to some correction rules; of course, it can be corrected by manual and corresponding modules, and so on.
  • This method can be used to know in time whether the data that needs to be imported into the database can be split, thereby further improving the splitting efficiency.
  • the error line data can be extracted to ensure the correctness of the imported data.
  • the split processed data when the split data block is imported into a different storage space in the database, the split processed data may be downloaded first; and the downloaded split processed data is downloaded. The partitions are imported into different storage spaces in the database.
  • the method further includes: Delete the downloaded split processed data. Thereby achieving the purpose of clearing the garbage data file and reducing the memory occupation. This allows the database to store more data.
  • the import result may also be fed back.
  • the method further includes: performing import processing on the split processed data. After the import result; feedback the above import results. This allows the user to clearly determine the import result.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
  • a data processing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and will not be described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes a receiving module 22, a processing module 24, and an importing module 26. The apparatus will be described below.
  • the receiving module 22 is configured to receive a data import instruction for instructing to import data into the database; the processing module 24, The receiving module 22 is configured to split the data according to the data importing instruction; the importing module 26 is connected to the processing module 24, and is configured to import the split processed data into different storages in the database. In space.
  • FIG. 3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
  • the processing module 24 includes a determining unit 32, an identifying module 34, and a first processing unit 36. The device will be described.
  • the determining unit 32 is configured to determine the table structure of the table and the data distribution information of the data on the table according to the data importing instruction; the identifying unit 34 is connected to the determining unit 32, and is set according to the table structure, the data distribution information, and the data importing instruction.
  • the descriptor information of the carried data identifies each data line field in the data; the first processing unit 36, coupled to the above-described identification unit 34, is arranged to split the data according to each of the data row fields in the identified data.
  • FIG. 4 is a second structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
  • the processing module 24 includes a determining unit 42, a second processing unit 44, a correcting unit 46, and a third Processing unit 48, the processing module 24 will be described below.
  • the determining unit 42 is configured to determine whether the data satisfies the splitting rule; the second processing unit 44 is connected to the determining unit, and is configured to perform splitting processing on the data if the determining result of the determining unit 42 is YES; The correcting unit 46 is connected to the determining unit 42 and configured to perform correction processing on the data when the determination result of the determining unit 42 is negative. The third processing unit 48 is connected to the correcting unit 46 and is set to correct the data. The processed data is split.
  • the import module 26 includes a download unit 52 and an import unit 54, which will be described below.
  • the download unit 52 is configured to download the split processed data
  • the import unit 54 is connected to the download unit 52, and is configured to import the downloaded split processed data into different storage spaces in the database.
  • FIG. 6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes a deletion module 62 in addition to all the modules shown in FIG. Be explained.
  • the deletion module 62 is connected to the above-described import module 26 and is set to delete the downloaded split processed data.
  • FIG. 7 is a second preferred structural block diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the apparatus includes a summary module 72 and a feedback module 74, in addition to all the modules shown in FIG. The device will be described.
  • the summary module 72 is connected to the import module 26, and is provided to summarize the import result after the split processing data is imported.
  • the feedback module 74 is connected to the summary module 72 and is provided to feed back the import result.
  • the existing solutions in the related art are all performed on a traditional single database, and the efficiency of the table is not required, and the system architecture is not required.
  • the solution in the embodiment of the present invention is based on a distributed database system, and satisfies the characteristics of the atomicity/consistency/isolation/durability (ACID) of the database, and can be executed concurrently.
  • ACID atomicity/consistency/isolation/durability
  • the data import client module 82 is included.
  • the module may be located between an external system and a data import server module, or may be located in an external system.
  • the module is not shown in FIG. 8), the data import server module 84 (corresponding to the download server 84 in FIG. 8, the same as the receiving module 22, the processing module 24, and the import module 26), and the metadata center module 86 ( Corresponding to the metadata server 84 in FIG. 8 , the cluster management center module 88 (corresponding to the cluster manager 88 in FIG. 8 , the same as the summary module 72 and the feedback module 74 described above), and the database proxy module 810 (the same as the above deletion) Module 62) and database module 812, each module will be described below.
  • the data import client module 82 (LoadClient) is mainly for the user, and the user initiates an import and export command through the module.
  • the data import server module 84 (LoadServer) is configured to accept the import and export commands sent by the client, split and merge the data files according to the data distribution policy, and interact with other modules to coordinate the entire import and export process.
  • the metadata center module 86 is arranged to store and manage all metadata information for the entire distributed database system.
  • the cluster management center module 88 is mainly responsible for monitoring, managing, and maintaining various database clusters (DBClusters).
  • the database agent module 810 is a database node management monitoring module. It is responsible for real-time monitoring of the running status of the DB nodes under its jurisdiction, and periodically collects running statistics.
  • Database module 812 is the underlying module that holds all data.
  • the data import and export server module 84 queries the metadata center module 86 for the metadata information of the table according to the cluster ID, the database name, and the table name, and is used to obtain the table structure definition and the data distribution information;
  • the data import server module 84 uses the obtained information (plus the data file descriptor information) to identify each data row field in the data file (datafilename), and performs data file splitting;
  • the data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
  • the data import server module 84 requests the cluster management center module 88 to notify the respective database agent module 810 to execute the real load data file LOAD DATA INFILE command after each database agent module 810 downloads successfully;
  • the data import and export server module 84 requests the cluster management center 88 to notify each database agent module 810 to delete the garbage data file (the garbage data file here may be the downloaded data after being loaded);
  • the data import server module 84 summarizes the results and notifies the data import and export client module 82.
  • FIG. 9 is a flowchart of data import processing according to an embodiment of the present invention. As shown in FIG. 9, the flow includes the following steps:
  • Step S902 the data import client module 82 sends an import data request to the data import server module 84.
  • Step S904 the data import server module 84 sends a query database metadata request to the metadata center module 86 according to the cluster ID, the database name, and the table name, and the request is used to query the metadata information of the table;
  • Step S906 the metadata center module 86 returns a table structure definition and data distribution information, including various field types and lengths of the table, and distribution keys and which DBGroups are distributed;
  • step S908 the data import server module 84 uses the information returned by the metadata center module 88 (in addition, the data file descriptor information) to identify each data row field in the data file (datafilename) for data file splitting. If the data is wrong during the splitting process, if the type does not meet the definition of the table, the error data is selected and placed in the error file;
  • Step S910 the data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
  • Step S912 the cluster management center module 88 notifies each database proxy module 810 to download the split file of the managed DBGroup;
  • each database proxy module 810 notifies the ftp service connection data import server module 84 to download the corresponding split file, and each database proxy module 810 successfully downloads the corresponding split file and returns to the cluster management center module 88. response;
  • Step S916 the cluster management center module 88 summarizes the download result
  • Step S918 after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84.
  • Step S920 after the data import server module 84 downloads successfully, the cluster management center module 88 is requested to notify each database agent module 810 to execute a real LOAD DATA INFILE command;
  • Step S922 the cluster management center module 88 notifies the database proxy module 810 to execute the real LOAD DATA INFILE command;
  • each database proxy module 810 connects to the managed database module to execute a real LOAD DATA INFILE command; after each database proxy module 810 executes the real LOAD DATA INFILE command successfully, it returns a successful response to the cluster management center module 88;
  • Step S926 after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84; after the LOAD DATA INFILE command is successfully executed, the data import server module 84 requests the cluster management center module again. 88 to notify each database agent module to delete the garbage data file; the data import server module 84 summarizes the results and notifies the data import client module 82.
  • the solution in the above embodiment is based on a distributed database system, and can import all data types supported by the Mysql database, and of course, can support other types of numbers.
  • Applying the solution in the embodiment of the present invention to the distributed database system can increase the concurrency of 2 to 3 times, balance the load, ensure the correctness of importing and exporting data, and the system is robust.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the data processing method and apparatus provided by the embodiments of the present invention have the following beneficial effects: the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a data processing method and device. The method comprises: receiving a data importing instruction for instructing a data import to a database; splitting the data according to the data importing instruction; and importing the split data to different storage spaces in the database in blocks. The present invention addresses the existing problem in the related art of low data importing efficiency, thus improving data importing efficiency.

Description

数据处理方法及装置Data processing method and device 技术领域Technical field
本发明涉及通信领域,具体而言,涉及一种数据处理方法及装置。The present invention relates to the field of communications, and in particular to a data processing method and apparatus.
背景技术Background technique
随着科技的发展,数据库在人们的生活中起着越来越重要的作用。在当前的信息化社会中,充分有效地管理和利用各类资源,是进行科学研究和决策管理的前提条件,数据库技术是管理信息系统、办公自动化系统、决策支持系统等各类信息系统的核心部分,是进行科学研究和决策管理的重要技术手段。With the development of technology, databases play an increasingly important role in people's lives. In the current information society, fully and effectively managing and utilizing various resources is a prerequisite for scientific research and decision management. Database technology is the core of various information systems such as management information systems, office automation systems, and decision support systems. Part of it is an important technical means for scientific research and decision management.
传统的数据库系统一般是通过高端设备,例如小型机或者高端存储来保证数据库完整性,或者通过增加内存中央处理器(Central Processing Unit,简称为CPU)来提高数据库处理能力。但是这种集中式的数据库架构越来越不适合海量数据库处理,数据导入效率低,而且也得付出高额的费用。Traditional database systems generally ensure database integrity through high-end devices, such as minicomputers or high-end storage, or increase database processing power by adding a central processing unit (CPU). However, this centralized database architecture is increasingly unsuitable for massive database processing, data import efficiency is low, and it also has to pay a high cost.
针对相关技术中存在的数据导入效率低的问题,目前尚未提出有效的解决方案。In view of the low efficiency of data introduction in the related art, an effective solution has not been proposed yet.
发明内容Summary of the invention
本发明提供了一种数据处理方法及装置,以至少解决相关技术中存在的数据导入效率低的问题。The present invention provides a data processing method and apparatus to solve at least the problem of low data import efficiency existing in the related art.
根据本发明的一个方面,提供了一种数据处理方法,包括:接收用于指示将数据导入数据库的数据导入指令;根据所述数据导入指令对所述数据进行拆分处理;将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。According to an aspect of the present invention, a data processing method includes: receiving a data import instruction for instructing data to be imported into a database; and splitting the data according to the data import instruction; The data chunks are imported into different storage spaces in the database.
可选地,根据所述数据导入指令对所述数据进行拆分处理包括:根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。Optionally, the splitting the data according to the data importing instruction comprises: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; and perform the data according to each data row field in the identified data. Split processing.
可选地,根据所述数据导入指令对所述数据进行拆分处理包括:判断所述数据是否满足拆分规则;在判断结果为是的情况下,对所述数据进行拆分处理;在判断结果为否的情况下,对所述数据进行修正处理;对修正处理后的数据进行拆分处理。Optionally, performing the splitting process on the data according to the data importing instruction includes: determining whether the data satisfies a splitting rule; and if the determining result is yes, performing the splitting process on the data; If the result is no, the data is subjected to correction processing; and the corrected data is subjected to split processing.
可选地,将拆分处理后的数据分块导入至所述数据库中不同的存储空间中包括:下载拆分处理后的数据;将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。 Optionally, importing the split data into the different storage spaces in the database includes: downloading the split processed data; and importing the downloaded split data into the storage In the different storage spaces in the database.
可选地,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:删除下载的所述拆分处理后的数据。Optionally, after the split processing data is into the different storage spaces in the database, the method further includes: deleting the downloaded data after the split processing.
可选地,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:汇总对拆分处理后的数据进行导入处理后的导入结果;反馈所述导入结果。Optionally, after the split data is imported into different storage spaces in the database, the method further includes: summarizing the import result after the split processed data is imported; and feeding back the import result.
根据本发明的另一方面,提供了一种数据处理装置,包括:接收模块,设置为接收用于指示将数据导入数据库的数据导入指令;处理模块,设置为根据所述数据导入指令对所述数据进行拆分处理;导入模块,设置为将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。According to another aspect of the present invention, a data processing apparatus is provided, comprising: a receiving module configured to receive a data import instruction for instructing data to be imported into a database; and a processing module configured to: according to the data import instruction The data is split and processed; the import module is configured to import the split processed data into different storage spaces in the database.
可选地,所述处理模块包括:确定单元,设置为根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;识别单元,设置为根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;第一处理单元,设置为根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。Optionally, the processing module includes: a determining unit, configured to determine a table structure of the table according to the data importing instruction and data distribution information of the data on the table; and an identifying unit configured to be according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; the first processing unit is configured to each data according to the identified data The row field splits the data.
可选地,所述处理模块包括:判断单元,设置为判断所述数据是否满足拆分规则;第二处理单元,设置为在所述判断单元的判断结果为是的情况下,对所述数据进行拆分处理;修正单元,设置为在所述判断单元的判断结果为否的情况下,对所述数据进行修正处理;第三处理单元,设置为对修正处理后的数据进行拆分处理。Optionally, the processing module includes: a determining unit, configured to determine whether the data satisfies a splitting rule; and a second processing unit configured to: when the determining result of the determining unit is yes, the data is Performing a splitting process; the correcting unit is configured to perform a correction process on the data when the determination result of the determination unit is negative; and the third processing unit is configured to perform a split process on the data after the correction process.
可选地,所述导入模块包括:下载单元,设置为下载拆分处理后的数据;导入单元,设置为将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。Optionally, the importing module includes: a downloading unit configured to download the split processed data; and an importing unit configured to import the downloaded split processed data into blocks into different storages in the database In space.
可选地,所述装置还包括:删除模块,设置为删除下载的所述拆分处理后的数据。Optionally, the device further includes: a deleting module, configured to delete the downloaded split processed data.
可选地,所述装置还包括:汇总模块,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;反馈模块,设置为反馈所述导入结果。Optionally, the device further includes: a summary module, configured to summarize the import result after the split processing is performed, and the feedback module is configured to feed back the import result.
通过本发明,采用接收用于指示将数据导入数据库的数据导入指令;根据所述数据导入指令对所述数据进行拆分处理;将拆分处理后的数据分块导入至所述数据库中不同的存储空间中,解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。According to the present invention, a data import instruction for instructing data to be imported into a database is received; the data is split according to the data import instruction; and the split data block is imported into the database. In the storage space, the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的数据处理方法的流程图;1 is a flow chart of a data processing method according to an embodiment of the present invention;
图2是根据本发明实施例的数据处理装置的结构框图;2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention;
图3是根据本发明实施例的数据处理装置中处理模块24的第一种结构框图; 3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention;
图4是根据本发明实施例的数据处理装置中处理模块24的第二种结构框图;4 is a block diagram showing a second structure of the processing module 24 in the data processing apparatus according to an embodiment of the present invention;
图5是根据本发明实施例的数据处理装置中导入模块26的结构框图;FIG. 5 is a structural block diagram of an import module 26 in a data processing apparatus according to an embodiment of the present invention; FIG.
图6是根据本发明实施例的数据处理装置的第一种优选结构框图;6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention;
图7是根据本发明实施例的数据处理装置的第二种优选结构框图;FIG. 7 is a block diagram showing a second preferred structure of a data processing apparatus according to an embodiment of the present invention; FIG.
图8是根据本发明实施例的导入系统结构框图;8 is a block diagram showing the structure of an import system according to an embodiment of the present invention;
图9是根据本发明实施例的数据导入处理流程图。9 is a flow chart of data import processing in accordance with an embodiment of the present invention.
具体实施方式detailed description
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order.
在本实施例中提供了一种数据处理方法,图1是根据本发明实施例的数据处理方法的流程图,如图1所示,该流程包括如下步骤:A data processing method is provided in this embodiment. FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
步骤S102,接收用于指示将数据导入数据库的数据导入指令;Step S102, receiving a data import instruction for instructing to import data into the database;
步骤S104,根据数据导入指令对上述数据进行拆分处理;Step S104, performing splitting processing on the data according to the data importing instruction;
步骤S106,将拆分处理后的数据分块导入至上述数据库中不同的存储空间中。In step S106, the split data block is imported into different storage spaces in the database.
通过上述步骤,在执行将数据导入到数据库中的处理时,首先对数据进行拆分处理,然后将拆分处理后的数据分块导入到数据库不同的存储空间中,并且对数据分块导入时,可以并行执行,提高导入效率。从而解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。上述的数据库可以称之为分布式数据库系统,该中数据库系统通过采用普通廉价的设备构建出高可用性和高扩展的集群,从而摆脱了大型设备的依赖。一个好的分布式数据库架构可以比较方便达到高可用性,可以达到向外扩展的能力。其中大数据量的导入导出功能是分布式数据库中较为关键的技术。Through the above steps, when performing the process of importing data into the database, the data is first split, and then the split data is imported into different storage spaces of the database, and when the data is partitioned. Can be executed in parallel to improve import efficiency. Therefore, the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved. The above-mentioned database can be called a distributed database system, in which the database system is free from the dependence of large equipment by constructing a high-availability and high-expansion cluster by using ordinary inexpensive equipment. A good distributed database architecture can be easily accessed for high availability and can scale out. The import and export function of large data volume is a key technology in distributed databases.
在对数据执行拆分处理时,可以有多种拆分方式,在一个可选的实施例中,可以依据用于拆分的表对上述数据进行拆分,其中,根据上述数据导入指令对数据进行拆分处理包括:根据上述数据导入指令确定表的表结构和数据在该表上的数据分布信息;根据上述表结构、数据分布信息和数据导入指令中携带的数据的描述符信息识别数据中每个数据行字段;根据识别的数据中每个数据行字段对数据进行拆分处理。其中,接收到数据导入指令后也可以首先去确定该数据导入指令的合法性,然后获取导入的目的库表的表结构信息和分布策略信息,进而读取数据文件,根据表结构信息和分布策略对待导入数据文件进行拆分,拆分成多个底层数据库(即,上述的存储空间)存储相对应的多个小文件并传送到各目的底层数据库的指 定目录下,然后通过集群管理模块下发指令到各底层数据库执行对应文件的导入。When the splitting process is performed on the data, there may be multiple splitting modes. In an optional embodiment, the data may be split according to the table used for splitting, wherein the data is imported according to the data import instruction. Performing the splitting process includes: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; and identifying data according to the table structure, the data distribution information, and the descriptor information of the data carried in the data importing instruction. Each data row field; the data is split according to each data row field in the identified data. After receiving the data import instruction, the legality of the data import instruction may be first determined, and then the table structure information and the distribution policy information of the imported destination library table are obtained, and then the data file is read, according to the table structure information and the distribution strategy. Splitting the imported data file into multiple underlying databases (ie, the above-mentioned storage space) storing a plurality of corresponding small files and transmitting them to the underlying database of each destination After the directory is specified, the cluster management module issues instructions to each of the underlying databases to perform import of the corresponding files.
能对数据进行拆分的前提是,该数据需要满足预定的拆分规则,但是也会存在数据不满足拆分规则的情况,该情况下,就需要对数据进行修正,以使该数据满足拆分规则。在一个可选的实施例中,根据上述数据导入指令对数据进行拆分处理包括:判断上述数据是否满足拆分规则;在判断结果为是的情况下,对数据进行拆分处理;在判断结果为否的情况下,对上述数据进行修正处理;对修正处理后的数据进行拆分处理。其中,在对数据进行修正处理时,可以有多种修正方式,可以是由管理员,即人为地去进行修正;也可以在无需人工干预的情况下,由执行拆分处理的模块获取其他模块根据某些修正规则去进行修正;当然可以由人工和相应地模块相互配合去进行修正,等等。采用该方法可以及时获知需要导入数据库的数据是否能够拆分,从而进一步提高拆分效率。并且,对于需要导入的数据存在错误的情况,可以将错误行数据提取出来,保证导入数据的正确性。The premise that the data can be split is that the data needs to meet the predetermined splitting rules, but there are cases where the data does not satisfy the splitting rules. In this case, the data needs to be corrected so that the data satisfies the split. Sub-rules. In an optional embodiment, the splitting the data according to the data importing instruction includes: determining whether the data satisfies the splitting rule; and if the determining result is yes, splitting the data; If not, the above data is corrected; the corrected data is split. Among them, when the data is corrected, there may be multiple correction methods, which may be performed by an administrator, that is, artificially; or, by means of a module that performs split processing, other modules may be acquired without manual intervention. Correction is performed according to some correction rules; of course, it can be corrected by manual and corresponding modules, and so on. This method can be used to know in time whether the data that needs to be imported into the database can be split, thereby further improving the splitting efficiency. Moreover, if there is an error in the data to be imported, the error line data can be extracted to ensure the correctness of the imported data.
在一个可选的实施例中,在将拆分处理后的数据分块导入至数据库中不同的存储空间中时,可以先下载拆分处理后的数据;再将下载的拆分处理后的数据分块导入至数据库中不同的存储空间中。In an optional embodiment, when the split data block is imported into a different storage space in the database, the split processed data may be downloaded first; and the downloaded split processed data is downloaded. The partitions are imported into different storage spaces in the database.
当将数据导入到数据库中之后,下载的数据可以不用继续保留,在一个可选的实施例中,在将拆分处理后的数据分块导入至数据库中不同的存储空间中之后,还包括:删除下载的拆分处理后的数据。从而实现清楚垃圾数据文件的目的,减少内存的占用。从而可以使得数据库存储更多的数据。After the data is imported into the database, the downloaded data may not be retained. In an optional embodiment, after the split processed data is partitioned into different storage spaces in the database, the method further includes: Delete the downloaded split processed data. Thereby achieving the purpose of clearing the garbage data file and reducing the memory occupation. This allows the database to store more data.
在将拆分处理后的数据分块导入至数据库中不同的存储空间中之后,还可以反馈导入结果,在一个可选的实施例中,还包括:汇总对拆分处理后的数据进行导入处理后的导入结果;反馈上述导入结果。从而可以使得用户清楚的确定导入结果。After the split processing data is imported into different storage spaces in the database, the import result may also be fed back. In an optional embodiment, the method further includes: performing import processing on the split processed data. After the import result; feedback the above import results. This allows the user to clearly determine the import result.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
在本实施例中还提供了一种数据处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In the embodiment, a data processing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and will not be described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图2是根据本发明实施例的数据处理装置的结构框图,如图2所示,该装置包括接收模块22、处理模块24和导入模块26,下面对该装置进行说明。2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes a receiving module 22, a processing module 24, and an importing module 26. The apparatus will be described below.
接收模块22,设置为接收用于指示将数据导入数据库的数据导入指令;处理模块24,连 接至上述接收模块22,设置为根据上述数据导入指令对数据进行拆分处理;导入模块26,连接至上述处理模块24,设置为将拆分处理后的数据分块导入至数据库中不同的存储空间中。The receiving module 22 is configured to receive a data import instruction for instructing to import data into the database; the processing module 24, The receiving module 22 is configured to split the data according to the data importing instruction; the importing module 26 is connected to the processing module 24, and is configured to import the split processed data into different storages in the database. In space.
图3是根据本发明实施例的数据处理装置中处理模块24的第一种结构框图,如图3所示,该处理模块24包括确定单元32、识别模块34和第一处理单元36,下面对该装置进行说明。3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the processing module 24 includes a determining unit 32, an identifying module 34, and a first processing unit 36. The device will be described.
确定单元32,设置为根据数据导入指令确定表的表结构和数据在表上的数据分布信息;识别单元34,连接至上述确定单元32,设置为根据表结构、数据分布信息和数据导入指令中携带的数据的描述符信息识别数据中每个数据行字段;第一处理单元36,连接至上述识别单元34,设置为根据识别的数据中每个数据行字段对数据进行拆分处理。The determining unit 32 is configured to determine the table structure of the table and the data distribution information of the data on the table according to the data importing instruction; the identifying unit 34 is connected to the determining unit 32, and is set according to the table structure, the data distribution information, and the data importing instruction. The descriptor information of the carried data identifies each data line field in the data; the first processing unit 36, coupled to the above-described identification unit 34, is arranged to split the data according to each of the data row fields in the identified data.
图4是根据本发明实施例的数据处理装置中处理模块24的第二种结构框图,如图4所示,该处理模块24包括判断单元42、第二处理单元44、修正单元46和第三处理单元48,下面对该处理模块24进行说明。4 is a second structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the processing module 24 includes a determining unit 42, a second processing unit 44, a correcting unit 46, and a third Processing unit 48, the processing module 24 will be described below.
判断单元42,设置为判断上述数据是否满足拆分规则;第二处理单元44,连接至上述判断单元,设置为在判断单元42的判断结果为是的情况下,对上述数据进行拆分处理;修正单元46,连接至上述判断单元42,设置为在判断单元42的判断结果为否的情况下,对上述数据进行修正处理;第三处理单元48,连接至上述修正单元46,设置为对修正处理后的数据进行拆分处理。The determining unit 42 is configured to determine whether the data satisfies the splitting rule; the second processing unit 44 is connected to the determining unit, and is configured to perform splitting processing on the data if the determining result of the determining unit 42 is YES; The correcting unit 46 is connected to the determining unit 42 and configured to perform correction processing on the data when the determination result of the determining unit 42 is negative. The third processing unit 48 is connected to the correcting unit 46 and is set to correct the data. The processed data is split.
图5是根据本发明实施例的数据处理装置中导入模块26的结构框图,如图5所示,该导入模块26包括下载单元52和导入单元54,下面对该装置进行说明。5 is a block diagram showing the structure of the import module 26 in the data processing apparatus according to the embodiment of the present invention. As shown in FIG. 5, the import module 26 includes a download unit 52 and an import unit 54, which will be described below.
下载单元52,设置为下载拆分处理后的数据;导入单元54,连接至上述下载单元52,设置为将下载的拆分处理后的数据分块导入至数据库中不同的存储空间中。The download unit 52 is configured to download the split processed data; the import unit 54 is connected to the download unit 52, and is configured to import the downloaded split processed data into different storage spaces in the database.
图6是根据本发明实施例的数据处理装置的第一种优选结构框图,如图6所示,该装置除包括图5所示的所有模块外,还包括删除模块62,下面对该装置进行说明。6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes a deletion module 62 in addition to all the modules shown in FIG. Be explained.
删除模块62,连接至上述导入模块26,设置为删除下载的拆分处理后的数据。The deletion module 62 is connected to the above-described import module 26 and is set to delete the downloaded split processed data.
图7是根据本发明实施例的数据处理装置的第二种优选结构框图,如图7所示,该装置除包括图2所示的所有模块外,还包括汇总模块72和反馈模块74,下面对该装置进行说明。FIG. 7 is a second preferred structural block diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the apparatus includes a summary module 72 and a feedback module 74, in addition to all the modules shown in FIG. The device will be described.
汇总模块72,连接至上述导入模块26,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;反馈模块74,连接至上述汇总模块72,设置为反馈上述导入结果。The summary module 72 is connected to the import module 26, and is provided to summarize the import result after the split processing data is imported. The feedback module 74 is connected to the summary module 72 and is provided to feed back the import result.
下面结合具体的实施例对本发明继续进行说明。The invention will be further described below in conjunction with specific embodiments.
从前述可以看出,相关技术中已有的方案都是针对传统的单个数据库进行,无需考虑表的分布结构以及系统架构,效率较低。而发明实施例中的方案是基于分布式数据库系统,满足数据库的原子性/一致性/隔离性/耐久性(Atomicity/Consistency/Isolation/Durability,简称为ACID)特性,且可以并发执行。采用shell脚本进行导入导出,具有高度实时性,移植性和可行性,极高的用户体验,是对现有技术的一次重大革新。 As can be seen from the foregoing, the existing solutions in the related art are all performed on a traditional single database, and the efficiency of the table is not required, and the system architecture is not required. The solution in the embodiment of the present invention is based on a distributed database system, and satisfies the characteristics of the atomicity/consistency/isolation/durability (ACID) of the database, and can be executed concurrently. The use of shell scripts for import and export, with a high degree of real-time, portability and feasibility, and a very high user experience, is a major innovation in the existing technology.
图8是根据本发明实施例的导入系统结构框图,如图8所示,包括数据导入客户端模块82(该模块可以位于外部系统和数据导入服务端模块之间,也可以位于外部系统中,该模块未在图8中画出)、数据导入服务端模块84(对应于图8中的下载服务器84,同上述的接收模块22、处理模块24和导入模块26)、元数据中心模块86(对应于图8中的元数据服务器84)、集群管理中心模块88(对应于图8中的集群管理器88,同上述的汇总模块72和反馈模块74)、数据库代理模块810(同上述的删除模块62)和数据库模块812,下面对各模块进行说明。8 is a structural block diagram of an import system according to an embodiment of the present invention. As shown in FIG. 8, the data import client module 82 is included. The module may be located between an external system and a data import server module, or may be located in an external system. The module is not shown in FIG. 8), the data import server module 84 (corresponding to the download server 84 in FIG. 8, the same as the receiving module 22, the processing module 24, and the import module 26), and the metadata center module 86 ( Corresponding to the metadata server 84 in FIG. 8 , the cluster management center module 88 (corresponding to the cluster manager 88 in FIG. 8 , the same as the summary module 72 and the feedback module 74 described above), and the database proxy module 810 (the same as the above deletion) Module 62) and database module 812, each module will be described below.
数据导入客户端模块82(LoadClient)主要面向用户,用户通过该模块发起导入导出命令。The data import client module 82 (LoadClient) is mainly for the user, and the user initiates an import and export command through the module.
数据导入服务端模块84(LoadServer)设置为接受客户端发送的导入导出命令,根据数据分布策略对数据文件进行拆分和合并,和其他模块进行交互,协调整个导入导出流程。The data import server module 84 (LoadServer) is configured to accept the import and export commands sent by the client, split and merge the data files according to the data distribution policy, and interact with other modules to coordinate the entire import and export process.
元数据中心模块86设置为保存和管理整个分布式数据库系统所有元数据信息。The metadata center module 86 is arranged to store and manage all metadata information for the entire distributed database system.
集群管理中心模块88主要负责各个数据库集群(DBCluster)的监控、管理和维护。The cluster management center module 88 is mainly responsible for monitoring, managing, and maintaining various database clusters (DBClusters).
数据库代理模块810,该模块为数据库节点管理监控模块,它负责实时监控其所管辖的DB节点的运行状态是否正常,定期收集运行统计信息。The database agent module 810 is a database node management monitoring module. It is responsible for real-time monitoring of the running status of the DB nodes under its jurisdiction, and periodically collects running statistics.
数据库模块812为底层模块,保存所有数据。 Database module 812 is the underlying module that holds all data.
利用图8中所示的导入系统结构框图可以实现的核心算法如下:The core algorithm that can be implemented by using the block diagram of the import system shown in Figure 8 is as follows:
对于导入流程:For the import process:
数据导入导出服务端模块84根据集群ID、数据库名和表名去元数据中心模块86查询该表的元数据信息,用于获取表结构定义和数据分布信息;The data import and export server module 84 queries the metadata center module 86 for the metadata information of the table according to the cluster ID, the database name, and the table name, and is used to obtain the table structure definition and the data distribution information;
数据导入服务端模块84使用获取的上述信息(加上数据文件描述符信息)来识别数据文件(datafilename)中的每个数据行字段,进行数据文件拆分;The data import server module 84 uses the obtained information (plus the data file descriptor information) to identify each data row field in the data file (datafilename), and performs data file splitting;
数据导入服务端模块84请求集群管理中心模块88去通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;The data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
数据导入服务端模块84在各个数据库代理模块810下载成功之后再请求集群管理中心模块88去通知各个数据库代理模块810执行真正的加载数据文件LOAD DATA INFILE命令;The data import server module 84 requests the cluster management center module 88 to notify the respective database agent module 810 to execute the real load data file LOAD DATA INFILE command after each database agent module 810 downloads successfully;
LOAD DATA INFILE命令执行成功后,数据导入导出服务端模块84再请求集群管理中心88去通知各个数据库代理模块810删除垃圾数据文件(这里的垃圾数据文件可以是已经加载之后的下载的数据);After the LOAD DATA INFILE command is successfully executed, the data import and export server module 84 requests the cluster management center 88 to notify each database agent module 810 to delete the garbage data file (the garbage data file here may be the downloaded data after being loaded);
数据导入服务端模块84汇总结果并通知数据导入导出客户端模块82。The data import server module 84 summarizes the results and notifies the data import and export client module 82.
图9是根据本发明实施例的数据导入处理流程图,如图9所示,该流程包括如下步骤:FIG. 9 is a flowchart of data import processing according to an embodiment of the present invention. As shown in FIG. 9, the flow includes the following steps:
步骤S902,数据导入客户端模块82向数据导入服务端模块84发送导入数据请求; Step S902, the data import client module 82 sends an import data request to the data import server module 84.
步骤S904,数据导入服务端模块84根据集群ID、数据库名和表名向元数据中心模块86发送查询数据库元数据请求,该请求用于查询该表的元数据信息;Step S904, the data import server module 84 sends a query database metadata request to the metadata center module 86 according to the cluster ID, the database name, and the table name, and the request is used to query the metadata information of the table;
步骤S906,元数据中心模块86返回表结构定义和数据分布信息,包括该表各个字段类型和长度,以及分发键和分布在哪些DBGroup上;Step S906, the metadata center module 86 returns a table structure definition and data distribution information, including various field types and lengths of the table, and distribution keys and which DBGroups are distributed;
步骤S908,数据导入服务端模块84使用元数据中心模块88返回的信息(此外,再加上数据文件描述符信息)来识别数据文件(datafilename)中的每个数据行字段,进行数据文件拆分,拆分过程中若发现数据错误,如类型不符合表定义,将错误数据挑选出来,放入错误文件中;In step S908, the data import server module 84 uses the information returned by the metadata center module 88 (in addition, the data file descriptor information) to identify each data row field in the data file (datafilename) for data file splitting. If the data is wrong during the splitting process, if the type does not meet the definition of the table, the error data is selected and placed in the error file;
步骤S910,数据导入服务端模块84请求集群管理中心模块88去通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;Step S910, the data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
步骤S912,集群管理中心模块88通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;Step S912, the cluster management center module 88 notifies each database proxy module 810 to download the split file of the managed DBGroup;
步骤S914,各个数据库代理模块810通知ftp服务连接数据导入服务端模块84所在服务器,下载对应的拆分文件,各个数据库代理模块810下载对应的拆分文件成功后,向集群管理中心模块88返回成功响应;In step S914, each database proxy module 810 notifies the ftp service connection data import server module 84 to download the corresponding split file, and each database proxy module 810 successfully downloads the corresponding split file and returns to the cluster management center module 88. response;
步骤S916,集群管理中心模块88汇总下载结果;Step S916, the cluster management center module 88 summarizes the download result;
步骤S918,集群管理中心模块88收到所有数据库代理模块810成功响应后,向数据导入服务端模块84返回成功响应;Step S918, after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84.
步骤S920,数据导入服务端模块84下载成功之后再请求集群管理中心模块88去通知各个数据库代理模块810执行真正的LOAD DATA INFILE命令;Step S920, after the data import server module 84 downloads successfully, the cluster management center module 88 is requested to notify each database agent module 810 to execute a real LOAD DATA INFILE command;
步骤S922,集群管理中心模块88通知数据库代理模块810去执行真正的LOAD DATA INFILE命令;Step S922, the cluster management center module 88 notifies the database proxy module 810 to execute the real LOAD DATA INFILE command;
步骤S924,各个数据库代理模块810连接管理的数据库模块,执行真正的LOAD DATA INFILE命令;各个数据库代理模块810执行真正的LOAD DATA INFILE命令成功后,向集群管理中心模块88返回成功响应;Step S924, each database proxy module 810 connects to the managed database module to execute a real LOAD DATA INFILE command; after each database proxy module 810 executes the real LOAD DATA INFILE command successfully, it returns a successful response to the cluster management center module 88;
步骤S926,集群管理中心模块88收到所有数据库代理模块810成功响应后,向数据导入服务端模块84返回成功响应;LOAD DATA INFILE命令执行成功后,数据导入服务端模块84再请求集群管理中心模块88去通知各个数据库代理模块删除垃圾数据文件;数据导入服务端模块84汇总结果并通知数据导入客户端模块82。Step S926, after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84; after the LOAD DATA INFILE command is successfully executed, the data import server module 84 requests the cluster management center module again. 88 to notify each database agent module to delete the garbage data file; the data import server module 84 summarizes the results and notifies the data import client module 82.
上述实施例中的方案是基于分布式数据库系统提出的,可以导入Mysql数据库支持的所有数据类型,当然也可以支持其他类型的数量。在分布式数据库系统上应用本发明实施例中的方案,可以提高2到3倍的并发量,均衡负载,且保证导入导出数据的正确性,系统强壮性较好。 The solution in the above embodiment is based on a distributed database system, and can import all data types supported by the Mysql database, and of course, can support other types of numbers. Applying the solution in the embodiment of the present invention to the distributed database system can increase the concurrency of 2 to 3 times, balance the load, ensure the correctness of importing and exporting data, and the system is robust.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:
S1,接收用于指示将数据导入数据库的数据导入指令;S1, receiving a data import instruction for instructing to import data into the database;
S2,根据数据导入指令对上述数据进行拆分处理;S2, splitting the data according to the data import instruction;
S3,将拆分处理后的数据分块导入至上述数据库中不同的存储空间中。S3, the split data block is imported into different storage spaces in the above database.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
工业实用性Industrial applicability
如上所述,本发明实施例提供的一种数据处理方法及装置具有以下有益效果:解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。 As described above, the data processing method and apparatus provided by the embodiments of the present invention have the following beneficial effects: the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.

Claims (12)

  1. 一种数据处理方法,包括:A data processing method comprising:
    接收用于指示将数据导入数据库的数据导入指令;Receiving a data import instruction for instructing data to be imported into the database;
    根据所述数据导入指令对所述数据进行拆分处理;Performing split processing on the data according to the data import instruction;
    将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。The split processed data is partitioned into different storage spaces in the database.
  2. 根据权利要求1所述的方法,其中,根据所述数据导入指令对所述数据进行拆分处理包括:The method of claim 1, wherein the splitting the data according to the data importing instruction comprises:
    根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;Determining, according to the data import instruction, a table structure of the table and data distribution information of the data on the table;
    根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;Identifying each data row field in the data according to the table structure, the data distribution information, and descriptor information of the data carried in the data import instruction;
    根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。The data is split according to each of the identified data row fields.
  3. 根据权利要求1所述的方法,其中,根据所述数据导入指令对所述数据进行拆分处理包括:The method of claim 1, wherein the splitting the data according to the data importing instruction comprises:
    判断所述数据是否满足拆分规则;Determining whether the data satisfies a splitting rule;
    在判断结果为是的情况下,对所述数据进行拆分处理;When the judgment result is yes, the data is split and processed;
    在判断结果为否的情况下,对所述数据进行修正处理;对修正处理后的数据进行拆分处理。When the determination result is negative, the data is subjected to correction processing; and the corrected data is subjected to resolution processing.
  4. 根据权利要求1所述的方法,其中,将拆分处理后的数据分块导入至所述数据库中不同的存储空间中包括:The method according to claim 1, wherein the importing the split processed data into different storage spaces in the database comprises:
    下载拆分处理后的数据;Download the split processed data;
    将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。The downloaded split data block is imported into different storage spaces in the database.
  5. 根据权利要求4所述的方法,其中,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:The method of claim 4, after the splitting of the processed data into different storage spaces in the database, further comprising:
    删除下载的所述拆分处理后的数据。The downloaded data after the split processing is deleted.
  6. 根据权利要求1所述的方法,其中,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:The method according to claim 1, wherein after the split-processed data is partitioned into different storage spaces in the database, the method further includes:
    汇总对拆分处理后的数据进行导入处理后的导入结果;Summarize the import result after importing and processing the data after the split processing;
    反馈所述导入结果。Feedback the import results.
  7. 一种数据处理装置,包括: A data processing device comprising:
    接收模块,设置为接收用于指示将数据导入数据库的数据导入指令;a receiving module, configured to receive a data import instruction for instructing data to be imported into the database;
    处理模块,设置为根据所述数据导入指令对所述数据进行拆分处理;a processing module, configured to perform splitting processing on the data according to the data importing instruction;
    导入模块,设置为将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。The import module is configured to import the split processed data into different storage spaces in the database.
  8. 根据权利要求7所述的装置,其中,所述处理模块包括:The apparatus of claim 7 wherein said processing module comprises:
    确定单元,设置为根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;a determining unit, configured to determine a table structure of the table according to the data importing instruction and data distribution information of the data on the table;
    识别单元,设置为根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;An identifying unit, configured to identify each data row field in the data according to the table structure, the data distribution information, and descriptor information of the data carried in the data importing instruction;
    第一处理单元,设置为根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。The first processing unit is configured to split the data according to each of the identified data row fields.
  9. 根据权利要求7所述的装置,其中,所述处理模块包括:The apparatus of claim 7 wherein said processing module comprises:
    判断单元,设置为判断所述数据是否满足拆分规则;a determining unit, configured to determine whether the data satisfies a splitting rule;
    第二处理单元,设置为在所述判断单元的判断结果为是的情况下,对所述数据进行拆分处理;a second processing unit configured to perform split processing on the data if the determination result of the determining unit is YES;
    修正单元,设置为在所述判断单元的判断结果为否的情况下,对所述数据进行修正处理;a correction unit configured to perform a correction process on the data if the determination result of the determination unit is negative;
    第三处理单元,设置为对修正处理后的数据进行拆分处理。The third processing unit is configured to perform split processing on the corrected data.
  10. 根据权利要求7所述的装置,其中,所述导入模块包括:The apparatus of claim 7, wherein the importing module comprises:
    下载单元,设置为下载拆分处理后的数据;Download unit, set to download the split processed data;
    导入单元,设置为将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。The import unit is configured to import the downloaded data after the split processing into different storage spaces in the database.
  11. 根据权利要求10所述的装置,其中,还包括:The apparatus of claim 10, further comprising:
    删除模块,设置为删除下载的所述拆分处理后的数据。The module is deleted and set to delete the downloaded data after the split processing.
  12. 根据权利要求7所述的装置,其中,还包括:The apparatus according to claim 7, further comprising:
    汇总模块,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;The summary module is set to summarize the import result after importing and processing the split processed data;
    反馈模块,设置为反馈所述导入结果。 A feedback module, configured to feed back the import result.
PCT/CN2015/092759 2015-04-23 2015-10-23 Data processing method and device WO2016169237A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510198455.3 2015-04-23
CN201510198455.3A CN106156209A (en) 2015-04-23 2015-04-23 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2016169237A1 true WO2016169237A1 (en) 2016-10-27

Family

ID=57143721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/092759 WO2016169237A1 (en) 2015-04-23 2015-10-23 Data processing method and device

Country Status (2)

Country Link
CN (1) CN106156209A (en)
WO (1) WO2016169237A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019205415A1 (en) * 2018-04-22 2019-10-31 平安科技(深圳)有限公司 Data import management method and apparatus, mobile terminal and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153852A (en) * 2017-12-22 2018-06-12 中国平安人寿保险股份有限公司 A kind of data processing method, device, terminal device and storage medium
CN108256087B (en) * 2018-01-22 2020-12-04 北京腾云天下科技有限公司 Data importing, inquiring and processing method based on bitmap structure
CN110110024B (en) * 2019-04-29 2021-12-17 东南大学 Method for importing high-capacity VCT file into spatial database
CN110795764A (en) * 2019-11-01 2020-02-14 中国银行股份有限公司 Data desensitization method and system
CN110990405B (en) * 2019-11-28 2024-04-12 中国银行股份有限公司 Data loading method, device, server and storage medium
CN113722277A (en) * 2020-05-25 2021-11-30 中兴通讯股份有限公司 Data import method, device, service platform and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055351A1 (en) * 2003-09-05 2005-03-10 Oracle International Corporation Apparatus and methods for transferring database objects into and out of database systems
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102906751A (en) * 2012-07-25 2013-01-30 华为技术有限公司 Method and device for data storage and data query
CN103077183A (en) * 2012-12-14 2013-05-01 北京普泽天玑数据技术有限公司 Data importing method and system for distributed sequence list
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8078825B2 (en) * 2009-03-11 2011-12-13 Oracle America, Inc. Composite hash and list partitioning of database tables

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055351A1 (en) * 2003-09-05 2005-03-10 Oracle International Corporation Apparatus and methods for transferring database objects into and out of database systems
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102906751A (en) * 2012-07-25 2013-01-30 华为技术有限公司 Method and device for data storage and data query
CN103077183A (en) * 2012-12-14 2013-05-01 北京普泽天玑数据技术有限公司 Data importing method and system for distributed sequence list
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019205415A1 (en) * 2018-04-22 2019-10-31 平安科技(深圳)有限公司 Data import management method and apparatus, mobile terminal and storage medium

Also Published As

Publication number Publication date
CN106156209A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2016169237A1 (en) Data processing method and device
US10108632B2 (en) Splitting and moving ranges in a distributed system
US10853242B2 (en) Deduplication and garbage collection across logical databases
CN110147407B (en) Data processing method and device and database management server
CN111258978B (en) Data storage method
US10877810B2 (en) Object storage system with metadata operation priority processing
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
CN102779185A (en) High-availability distribution type full-text index method
WO2019109854A1 (en) Data processing method and device for distributed database, storage medium, and electronic device
CN108563697B (en) Data processing method, device and storage medium
US10515228B2 (en) Commit and rollback of data streams provided by partially trusted entities
CN114077602B (en) Data migration method and device, electronic equipment and storage medium
US11216421B2 (en) Extensible streams for operations on external systems
CN109299225A (en) Log searching method, system, terminal and computer readable storage medium
CN116204575A (en) Method, device, equipment and computer storage medium for importing data into database
CN111447265A (en) File storage method, file downloading method, file processing method and related components
CN112685499A (en) Method, device and equipment for synchronizing process data of work service flow
US10185735B2 (en) Distributed database system and a non-transitory computer readable medium
JP5684671B2 (en) Condition retrieval data storage method, condition retrieval database cluster system, dispatcher, and program
CN116775712A (en) Method, device, electronic equipment, distributed system and storage medium for inquiring linked list
CN112905676A (en) Data file importing method and device
CN113448775B (en) Multi-source heterogeneous data backup method and device
CN115587119A (en) Database query method and device, electronic equipment and storage medium
CN112100208B (en) Method and device for forwarding operation request
CN111782634B (en) Data distributed storage method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15889720

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15889720

Country of ref document: EP

Kind code of ref document: A1