CN105765569A - Data distribution method, loader and storage system - Google Patents

Data distribution method, loader and storage system Download PDF

Info

Publication number
CN105765569A
CN105765569A CN201480029493.XA CN201480029493A CN105765569A CN 105765569 A CN105765569 A CN 105765569A CN 201480029493 A CN201480029493 A CN 201480029493A CN 105765569 A CN105765569 A CN 105765569A
Authority
CN
China
Prior art keywords
target data
data
database system
loader
data record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201480029493.XA
Other languages
Chinese (zh)
Other versions
CN105765569B (en
Inventor
王�锋
陈浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN105765569A publication Critical patent/CN105765569A/en
Application granted granted Critical
Publication of CN105765569B publication Critical patent/CN105765569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Abstract

A data distribution method, a loader and a storage system. The method includes: acquiring, by the loader, data to be distributed, and partitioning the data to be distributed into data records; converting the data records into target data records, and determining target data nodes corresponding to the target data records in a database system according to a data distribution policy of the database system, formats of the target data records being formats capable of being recognized by the database system; and sending the target data records to the target data nodes. The loader executes data conversion, and determines the target data nodes corresponding to the target data records, so that the occupation of resources of the database system can be reduced, and internal network resources of the database system are no longer occupied. Thus, the response speed and storage efficiency of the storage system can be increased.

Description

Data distribution method, loader and storage system Technical Field
The invention relates to the technical field of information, in particular to a data distribution method, a loader and a storage system.
Background
With the popularization of information technology in various application fields of society, the problem of big data is increasingly highlighted. The big data has the characteristics of '4V', wherein an important characteristic 'Volume' is used for indicating the huge amount of data. Big data starts in the internet field, and a large number of internet users generate a large amount of data in activities such as social contact, online transaction and the like, and the big data comprise structured data such as internet logs, transaction records and the like, and unstructured data such as pictures, audio, videos and the like.
In order to keep continuous prosperity and growth in the traditional industry, more detailed user data collection, new service mode exploration and accurate marketing are needed. The data volume is exponentially increased in various industries, and the total data volume in the world is expected to reach 10EB level in 2020. A storage system that is flexible in expansion with an increase in data volume, can store data persistently and reliably, and has a high cost performance becomes a key to solve the problem.
In order to provide a storage system with high cost performance and capable of storing data persistently and reliably, a large number of distributed database systems with a Massively Parallel Processing (MPP) architecture, that is, MPP database systems, appear in the field of structured data storage and Processing. However, since the growth rate of structured data is also quite remarkable, the new data volume generated by a production system in most scenes per day reaches above the TB level, how to solve the rapid warehousing of a large amount of data, avoid data accumulation and improve the real-time performance of the data is also an index for measuring a large data system.
At present, in order to realize the fast storage of data in a structured data storage system, the following scheme is adopted:
1. after meeting a preset starting condition, the loader divides intermediate data in the local file system to obtain target data blocks to be distributed, and then stores the target data blocks to be distributed into a sending queue;
2. sequentially taking out target data blocks to be distributed from the sending queue and sending the target data blocks to each data node of the MPP database system;
3. after receiving the target data block, the first data node analyzes all the received target data blocks and converts the target data blocks into data records, and then redistribution calculation is carried out: and judging whether the first data node is a target data node of the data record. Then, storing the data records according to the redistribution calculation result: and if the data record is the target data node, the data record is stored in the first data node, otherwise, the data record is sent to a second data node through an internal network of the MPP database system, the first data node and the second data node are both data nodes of the MPP database system, the first data node is a node for receiving the target data block, and the second data node is a target node of the target data block.
In the MPP database system, data nodes need to perform data conversion and redistribution calculation, and data records occupy intranet transmission resources of the MPP database system after reaching the MPP database system, so that the batch import scheme of the intermediate data occupies too much resources of the MPP database system, hardware overload is easy to occur, response speed is low, and storage efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a data distribution method, a loader and a storage system, which are used for improving the response speed and the storage efficiency of the storage system.
An embodiment of the present invention provides a data distribution method, which is applied to a storage system, where the storage system includes a loader and a database system, and includes:
the loader acquires data to be distributed and divides the data to be distributed into data records;
the loader converts the data records into target data records, and determines corresponding target data nodes of the target data records in the database system according to a data distribution strategy of the database system, wherein the format of the target data records is a format which can be identified by the database system;
the loader sends the target data record to the target data node.
With reference to the implementation scheme of the aspect, in a first optional implementation scheme, the format of the target data record and the data distribution policy of the database system are both configured locally on the loader;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
and the loader converts the data records into the target data records according to the format of the target data records configured in the local, and determines the corresponding target data nodes of the target data records in the database system according to the data distribution strategy of the database system configured in the local.
With reference to the implementation scheme of the aspect, in a second optional implementation scheme, before the loader converts the data record into the target data record, the method further includes:
the loader receives the format of the target data record and the data distribution strategy of the database system;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
and the loader executes locally stored logic codes according to the received format of the target data record and the data distribution strategy of the database system, converts the data record into the target data record by executing the logic codes, and determines a corresponding target data node of the target data record in the database system.
With reference to the implementation scheme of the aspect, in a third optional implementation scheme, before the converting the data record into the target data record, the method further includes:
the loader receives logic code, and the format of the target data record and the data distribution strategy of the database system are specified in the logic code;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
and the loader executes the logic codes, converts the data records into the target data records by executing the logic codes, and determines corresponding target data nodes of the target data records in the database system.
With reference to the third alternative implementation of an aspect, in a fourth alternative implementation, the logic code is platform-independent code.
With reference to the third optional implementation of the aspect, in a fifth optional implementation, the loader receives the logic code, and includes: and the loader receives the logic codes transmitted by the database system.
With reference to the implementation scheme of the one aspect and the first, second, third, fourth, or fifth optional implementation scheme of the one aspect, in a sixth optional implementation scheme, the sending, by the loader, the target data record to the target data node includes:
and the loader stores the target data record into a queue corresponding to the target data node, and takes out the target data record from the queue according to a first-in first-out principle and sends the target data record to the target data node.
An embodiment of the present invention provides a loader, including:
the data dividing module is used for acquiring data to be distributed and dividing the data to be distributed into data records;
the data conversion module is used for converting the data records obtained by the data segmentation module into target data records, and the format of the target data records is a format which can be identified by a database system;
the distribution calculation module is used for determining a corresponding target data node of the target data record obtained by conversion of the data conversion module in the database system according to a data distribution strategy of the database system;
and the distribution module is used for sending the target data record converted by the data conversion module to the target data node determined by the distribution calculation module.
With reference to the implementation scheme of the second aspect, in a first optional implementation scheme, the format of the target data record and the data distribution policy of the database system are both configured locally on the loader;
the data conversion module is specifically configured to convert the data record into the target data record according to a format of the target data record configured locally;
the distribution calculation module is specifically configured to determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured locally.
With reference to the implementation of the second aspect, in a second optional implementation, the loader further includes:
the parameter receiving module is used for receiving the format of the target data record and the data distribution strategy of the database system before the data conversion module converts the data record into the target data record;
the data conversion module is specifically configured to execute a locally stored logic code according to the format of the target data record received by the parameter receiving module, and convert the data record into the target data record by executing the logic code;
the distribution calculation module is specifically configured to execute the locally stored logic code according to the data distribution policy of the database system received by the parameter receiving module, and determine a target data node corresponding to the target data record in the database system by executing the logic code.
With reference to the implementation of the second aspect, in a third optional implementation, the loader further includes:
a code receiving module, configured to receive logic code before the data conversion module converts the data record into the target data record, where a format of the target data record and a data distribution policy of the database system are specified in the logic code;
the data conversion module is specifically configured to execute the logic code, and convert the data record into the target data record by executing the logic code;
the distribution calculation module is specifically configured to execute the logic code, and determine a target data node corresponding to the target data record in the database system by executing the logic code.
With reference to the third alternative implementation of the second aspect, in a fourth alternative implementation, the logic code is platform-independent code.
With reference to the second aspect, in a fifth optional implementation, the code receiving module is specifically configured to receive a logic code sent by the database system.
With reference to the implementation scheme of the second aspect, the first, second, third, fourth, or fifth optional implementation scheme of the second aspect, in a sixth optional implementation scheme, the loader further includes:
the storage module is used for storing the queue corresponding to the target data node;
the distribution calculation module is also used for storing the target data record into a queue corresponding to the target data node;
the distribution module is specifically configured to take out the target data record from the queue stored in the storage module according to a first-in first-out principle, and send the target data record to the target data node.
The embodiment of the invention also provides a storage system in three aspects, which comprises:
a loader and database system communicatively coupled; the loader is any one loader provided by the embodiment of the invention.
With reference to the implementation of the third aspect, in a first optional implementation, if the loader is the loader in the second optional implementation of the second aspect;
the database system is used for sending the format of the target data record and the data distribution strategy of the database system to the loader;
if the loader is the loader of the second aspect in the third or fourth alternative implementation;
and the database system is used for sending the logic code to the loader, and the format of the target data record and the data distribution strategy of the database system are specified in the logic code.
With reference to the implementation of the three aspects or the first optional implementation of the three aspects, in a second optional implementation, the storage system further includes:
the production system is used for generating original data and sending the original data to the preprocessing system;
the preprocessing system is used for preprocessing the original data to obtain intermediate data, and sending the intermediate data to the loader as the data to be distributed.
An aspect of embodiments of the present invention further provides a loader, including: a receiver, a transmitter, and a processor;
the receiver is used for acquiring data to be distributed;
the processor is used for dividing the data to be distributed into data records; converting the data record into a target data record, and determining a target data node corresponding to the target data record in a database system according to a data distribution strategy of the database system, wherein the format of the target data record is a format which can be identified by the database system;
the transmitter is configured to send the target data record to the target data node.
With reference to the implementation of the fourth aspect, in a first optional implementation, the loader further includes a memory: the memory is used for storing the format of the target data record and the data distribution strategy of the database system;
the processor is specifically configured to convert the data record into the target data record according to a format of the target data record configured in a local area, and determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured in the local area.
With reference to the implementation of the fourth aspect, in a second alternative implementation, the receiver is further configured to receive a format of the target data record and a data distribution policy of the database system;
the processor is specifically configured to execute a logic code locally stored according to the received format of the target data record and the data distribution policy of the database system, convert the data record into the target data record by executing the logic code, and determine a target data node corresponding to the target data record in the database system.
With reference to the implementation of the fourth aspect, in a third optional implementation, the receiver is further configured to receive a logic code, where the logic code specifies a format of the target data record and a data distribution policy of the database system;
the processor is specifically configured to execute the logic code, convert the data record into the target data record by executing the logic code, and determine a target data node corresponding to the target data record in the database system.
With reference to the fourth aspect, in a fourth alternative implementation, the logic code is platform independent code.
With reference to the fourth aspect, in a fifth optional implementation, the receiver is specifically configured to receive the logic code sent by the database system.
With reference to the implementation scheme of the fourth aspect and the first, second, third, fourth, or fifth optional implementation scheme of the fourth aspect, in a sixth optional implementation scheme, the processor is specifically configured to store the target data record into a queue corresponding to the target data node, and take out the target data record from the queue according to a first-in first-out principle;
and the transmitter is specifically configured to send the retrieved target data record to the target data node.
The loader executes data conversion, and the loader determines the target data nodes corresponding to the target data records, so that the data nodes of the database system are not required to perform data conversion and redistribution calculation after the target data records are sent to the database system, and the occupation of resources of the database system is reduced; and the intranet resources of the database system are not occupied. Therefore, the scheme can reduce the possibility of overload of the database system, thereby improving the response speed and the storage efficiency of the storage system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method of the present invention as practiced in an embodiment of the present invention;
FIG. 2 is a flow chart of a method of combining system configurations according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method of combining system configurations according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a loader according to an embodiment of the present invention;
FIG. 5 is a schematic view of a loader according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a loader according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a loader according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a loader according to an embodiment of the present invention;
FIG. 9 is a schematic view of a loader according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating a memory system according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating a memory system according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a structure of a storage system according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data distribution method, as shown in fig. 1, with reference to fig. 11, including:
101: the method comprises the steps that a loader acquires data to be distributed and divides the data to be distributed into data records;
in this embodiment, the data to be distributed is data that needs to be sent to the database system for storage, and the format of the data may be original data or intermediate data processed by the preprocessing device, and a scheme of the intermediate data processed by the preprocessing device is preferably adopted, so that it is possible to avoid that illegal data and data that does not conform to data consistency occupy data processing resources.
In this embodiment, the rule used for dividing the data record may adopt various existing dividing rules, for example: the text data to be distributed is divided into a plurality of data records by identifying line breaks/comma delimiters or the like. Specific segmentation rules embodiments of the present invention are not intended to be limited uniquely.
The starting condition for acquiring and dividing the data to be distributed may be any set starting condition, for example: the implementation of the embodiment of the invention is not affected by the timeout of the timer, the reception of an instruction for executing content distribution, the reception of a data distribution request of a database system and the like; therefore, the embodiment of the present invention does not uniquely limit the starting condition for acquiring and dividing the data to be distributed.
102: the loader converts the data record into a target data record, and the format of the target data record is a format which can be identified by the database system;
in this embodiment, the database system cannot generally recognize all formats, and therefore, the data records need to be converted to obtain the target data records, so that the target data records can be recognized by the database system. How the loader obtains the format identified by the database system may be determined according to different requirements, for example, pre-configuring or receiving a transmission from other devices, which is not limited in the embodiments of the present invention.
In the embodiment of the present invention, the database system may be a database system including an MPP database system, and the MPP database system as a specific application example should not be construed as a unique limitation to the embodiment of the present invention.
103: the loader determines a corresponding target data node of the target data record in the database system according to the data distribution strategy of the database system;
in a database system, data distribution is usually not randomly distributed, and a certain distribution strategy is usually adopted for two purposes: the method has the advantages that firstly, the data can be approximately uniformly distributed on each physical server of the distributed system, the data skew is prevented, secondly, the data distribution is optimized for the follow-up common service processing algorithm, and the purpose of improving the query processing performance is achieved. In a large database system, there are many data nodes to perform storage of target data records, and the distribution policy is embodied by storage of different data nodes. For a certain target data record, there will be a unique storage location and therefore also a unique data node as its target data node. The data distribution strategy of the database system can be Hash distribution, Range distribution or copy distribution, and the embodiment of the invention does not make unique regulation on the specific selection of the data distribution strategy.
104: the loader sends the target data record to the target data node.
In the above embodiment, the loader performs data conversion, and the loader has determined the target data node corresponding to the target data record, so after the target data record is sent to the database system, the data node of the database system is no longer required to perform data conversion and redistribution calculation, thereby reducing occupation of resources of the database system; and the intranet resources of the database system are not occupied. Therefore, the scheme can reduce the possibility of overload of the database system, thereby improving the response speed and the storage efficiency of the storage system.
The embodiment of the invention also provides a preferable implementation scheme for sending the target data record in the scheme, which comprises the following steps: the sending, by the loader, the target data record to the target data node includes:
and the loader stores the target data record into a queue corresponding to the target data node, takes out the target data record from the queue according to a first-in first-out principle and sends the target data record to the target data node.
The embodiment of the invention is applied to the scene with very large data volume, so that the target data is likely to be congested in the sending process, and in order to reduce the occurrence of the situation and improve the sending efficiency, the embodiment of the invention adopts the scheme and stores the target data nodes through the queues; because the queues correspond to the target data nodes, and a database system has a plurality of data nodes, different target data nodes exist for different target data records, so that a plurality of queues correspond to the target data nodes one by one correspondingly; in this scenario, one queue corresponds to one data node, which avoids resource contention among data nodes and rate limitation using one transmission queue.
In the embodiment of the present invention, in the process of processing data to be distributed, a loader needs to obtain some information that restricts data processing, and the sources and obtaining modes of the information may be different based on different applications, and the embodiment of the present invention provides three types of optional implementation schemes, as follows:
a,
The format of the target data record and the data distribution strategy of the database system are configured in the local part of the loader;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system, including:
and the loader converts the data records into the target data records according to the format of the target data records configured in the local, and determines the corresponding target data nodes of the target data records in the database system according to the data distribution strategy of the database system configured in the local.
In this embodiment, the information for constraining data processing is configured locally in the loader, which is more effective in a dedicated database system, and the information for constraining data processing is more stable in correspondence to the determined database system. The information for restricting the data processing is directly configured at the local part of the loader, so that the required functions can be conveniently realized. In this embodiment, the information for constraining the data processing is not necessarily all configured at the local of the loader, and a part of the information may be configured at the local of the loader, and the other information may be obtained in other manners, which does not affect the implementation of the embodiment of the present invention.
II,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: before the loader converts the data record into the target data record, the method further comprises the following steps:
the loader receives the format of the target data record and the data distribution strategy of the database system;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system, including:
and the loader executes locally stored logic codes according to the received format of the target data record and the data distribution strategy of the database system, converts the data record into the target data record by executing the logic codes, and determines a corresponding target data node of the target data record in the database system.
In this embodiment, the logic code is stored locally in the loader, and the constraint condition in the logic code may be assigned by an external device, that is, the logic code may be controlled by a device including a database system, so that the solution of the embodiment of the present invention may be conveniently implemented and compatible with various database systems. After receiving the format of the target data record and the data distribution strategy of the database system, the loader; may perform: and assigning the format of the target data record and the data distribution strategy of the database system to corresponding variables in the logic code. The specific use process of the loader after receiving the format of the target data record and the data distribution policy of the database system is not limited to one implementation manner of assignment operation, and the above examples should not be construed as the only limitation to the embodiments of the present invention.
III,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: before the loader converts the data record into the target data record, the method further comprises the following steps:
the loader receives a logic code, and the format of the target data record and the data distribution strategy of the database system are specified in the logic code;
the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system, including:
and the loader executes the logic codes, converts the data records into the target data records by executing the logic codes, and determines corresponding target data nodes of the target data records in the database system.
In this embodiment, the information that constrains the data handling is carried in the form of logical code, and for the loader, the loader obtains the information that constrains the data handling, but may not parse and identify the information. In this embodiment, the logic code does not necessarily include all information for restricting data processing, and may only include a part of the logic code, and the other part is obtained by other methods, which does not affect the implementation of the embodiment of the present invention.
The loader of the embodiment receives the logic code, and the loader does not need to be configured with information for restricting data processing, so that the loader can be conveniently compatible with various database systems. In addition, because a general complex analysis and distribution mechanism is not needed to be realized on the loader side, the logic code is directly generated through a compiling technology (code generation technology) on the database system side, so that the efficiency of executing the logic code on the loader is higher, and the hardware pressure is reduced.
In the above embodiments, the second scheme and the third scheme both have logical codes, and the difference is that the format of the target data record and the data allocation policy of the database system are not specified in the logical codes in the second scheme; the logic code in the second scheme is therefore more versatile. The logical code in the third scheme already specifies the format of the target data record and the data allocation policy of the database system, so that the logical code in the third scheme is more specific.
In this embodiment of the present invention, the logic code may be a code that can be directly executed, or may also be a code that needs to be compiled to be executed. The embodiment of the invention can adopt a code which is irrelevant to a platform (Portable), thereby being more convenient for the compatibility between a database system and a loader, and comprising the following specific steps: the above logical code is platform independent code.
In the embodiment of the present invention, after the platform-independent code is received by the loader, the loader may analyze the received code and execute the received code, or may compile the received code into an intermediate code after all the received codes are received, and then execute the intermediate code. Specific implementation the embodiments of the present invention are not to be considered as being uniquely described.
In this embodiment, the logic code may be executed in a manner of analyzing one piece of execution, or may be executed after all pieces of analysis are completed, or even may be executed after the pieces of analysis are completed and converted into the client.
In the embodiment of the invention, the sender of the logic code can be a database system and a third-party device except for a loader, or can be a database system, wherein the database system generates the logic code according to the definition of the data table and the internal format requirement, so that the efficiency and the compatibility are higher. Therefore, as a preferred implementation manner of the embodiment of the present invention, the following may be specifically mentioned: the loader receiving logic code comprising: and the loader receives the logic codes transmitted by the database system.
In addition to the three types of optional implementation schemes for the foregoing transaction, the data distribution strategy of the database system can be obtained by receiving the data distribution strategy, and the target data record format can be configured locally; and the data distribution strategy of the database system can be configured locally, and the format of the target data record is obtained through receiving. The present invention is not limited in this regard.
The following embodiments will illustrate in detail specific implementations of embodiments of the present invention in two alternative implementations, in conjunction with specific internal structures of the production system, the pre-processing equipment, the loader, and the database system.
As shown in fig. 2, the storage system shown in fig. 2 includes a production system, a preprocessing device, a loader, and an MPP (Massively Parallel Processing) database system; the loader comprises a local file system, a data segmentation module, a data conversion and distribution calculation module, a first-in first-out queue, a distribution module and a service processing module; the MPP database system comprises N data nodes and control nodes. The data conversion and distribution calculation module represents a data conversion module and a distribution calculation module, and the two modules may be set in a unified manner or in a single manner, which is not limited in the present invention.
At least a new data conversion & distribution calculation module is required in the loader and the distribution module is modified. In the data processing process of executing the data import service in the embodiment of the present invention, hardware resources of devices related to the data import service need to be fully utilized, data division is made in advance, and a processing flow refers to fig. 2:
a1, the production system generates a large amount of raw data and outputs the raw data to the preprocessing device in batches according to the preset strategy.
A2, after completing various data validity processing and data consistency checking, the preprocessing device forms a Text with a specific format, typically a format such as TXT (Text File) and CSV (Comma Separated Value), and then saves/mounts these files (i.e. intermediate data, or data to be distributed) into a Local File System (Local File System) of a loader (loader) associated with the MPP database System.
B1, after the control node of the MPP database system receives the instruction of the user, the controller node informs each data node to start data batch import; or, the data batch import timer on the control node is overtime, and the controller node notifies each data node to start data batch import. The above message informing that batch import is started may not need to carry other parameters.
B2, each data node of the MPP database system sends a request data message to the service processing module of the loader.
Note: the IP address and the port number of the loader associated with the MPP database system can be stored in each data node, so that after the indication message for starting the batch import of the data is received, the request can be sent to the designated loader.
B3, the service processing module starts the data dividing module, the distribution module, and the corresponding process of the data conversion & distribution calculation module.
In the above step B3, the service processing module further needs to send the following information to the above modules:
the data splitting module is informed of a preset data loading policy (i.e., a rule on how to split data in the local file system).
The following rules are notified to the data conversion & distribution calculation module, as shown in table 1 below:
TABLE 1
Figure PCTCN2014090359-APPB-000001
B4, the data dividing module starts the dividing task, traverses all files in the appointed directory, divides the text into multiple records by identifying the line feed character/comma separator and sends one or more divided records (i.e. data records) to the data conversion & distribution calculation module.
The above specified directory may be configured locally at the loader or may be brought in by parameters by the control node of the MPP database. The specific manner of how the specified directory is determined is not meant to be exclusive.
B5, the data conversion and distribution calculation module analyzes the received record according to the analysis rule of the external text. The parsed record becomes an internal format (i.e., a target data record) of the MPP database system according to Schema organization data of the data table to be imported, and may further add other transaction information (which may mainly include a transaction number depending on a specific implementation).
And the data conversion and distribution calculation module determines a data node where the target data record is located according to the finally formed target data record (which conforms to the data structure in the MPP database system) according to the data distribution strategy of the MPP database system, and then stores the target data record into a first-in first-out queue corresponding to the data node.
In this embodiment, there is one FIFO queue for each data node.
B6-B7, the distribution module checks the first-in first-out queue corresponding to each data node, and sends the target data record in each first-in first-out queue to the corresponding data node.
Secondly, as shown in fig. 3, the storage system shown in fig. 3 comprises a production system, a pretreatment device, a loader and an MPP database system; the loader comprises a local file system, a data segmentation module, a data conversion and distribution calculation module, a first-in first-out queue, a distribution module, a compiling module and a service processing module; the MPP database system comprises N data nodes, a data conversion code generation module and a control node. The logic in this embodiment is replaced with the logic code in the previous embodiments, and is code independent of the platform.
At least a data conversion & distribution calculation module and a compiling module need to be added in the loader, and a distribution module needs to be modified. And a data conversion code generation module is additionally arranged in the MPP database system. In the data processing process of executing the data import service in the embodiment of the present invention, hardware resources of devices related to the data import service need to be fully utilized, data division is made in advance, and a processing flow refers to fig. 3:
a1, the production system generates a large amount of raw data and outputs the raw data to the preprocessing device in batches according to the preset strategy.
A2, the preprocessing device forms text in a specific format (such as TXT format or CSV format) after completing various data validity processing and data consistency checking, and then saves/mounts the files (i.e. intermediate data or data to be distributed) in a Local File System (Local File System) of a loader (loader) associated with the MPP database System.
B1, after the control node of the MPP database system receives the instruction of the user, the controller node informs each data node to start data batch import; or, the data batch import timer on the control node is overtime, and the controller node notifies each data node to start data batch import. The above message informing that batch import is started may not need to carry other parameters.
And B2, generating logic codes (which can be codes independent of the platform) related to data conversion and data classification according to the Schema of the data table to be imported.
In this step, all the dependencies on the system configuration or internal information can be changed into a determined value, and meanwhile, unnecessary operations such as category judgment in the internal interpretation execution process of the MPP database system are removed through the compiling optimization technology, so that codes of conversion and classification operations are simplified as much as possible.
B3, each data node of the MPP database system sends a request data message to the service processing module of the loader, and the data request message carries the logic code generated by the data conversion code generation module.
B4, after receiving the request data message, the service processing module starts the corresponding processes of the data dividing module, the distribution module, the data conversion & distribution calculation module and the compiling module.
The service processing module also needs to send the following information to the above modules:
informing a preset data loading strategy to a data segmentation module; sending the data distribution strategy to a data conversion and distribution calculation module; and sending the logic code to a compiling module.
B5, the data dividing module starts the dividing task, traverses all files in the appointed directory, divides the text into multiple records by identifying the line feed character/comma separator and sends one or more divided records (i.e. data records) to the data conversion & distribution calculation module.
B6, compiling the logic code by the compiling module to generate intermediate code (data conversion executor), and sending to the data conversion & distribution calculating module.
Note: the steps B5 and B6 are not sequential and can be synchronously carried out.
B7, the data conversion & distribution calculation module analyzes the text by the intermediate code (data conversion executor) sent by the compiling module, and the analyzed data record is in the format inside the MPP database system (namely: the target data record), and other transaction information (which mainly includes a transaction number depending on the specific implementation) can be added.
And the data conversion and distribution calculation module determines a data node where the target data record is located according to a data distribution strategy by operating the data conversion executor on the basis of the finally formed target data record, and then stores the data record into a queue corresponding to the data node.
B8-B9, the distribution module checks the first-in first-out queue corresponding to each data node, and sends the target data record in each first-in first-out queue to the corresponding data node.
By adopting the scheme of the embodiment of the invention, the method has at least the following beneficial effects:
1. the excessive occupation of the computing resources and the memory of the MPP database system by data conversion executed at the data nodes of the MPP database system in the data import process is eliminated.
2. The occupation of network bandwidth inside the MPP database system caused by data redistribution at the data nodes is eliminated. The bandwidth of the internal network can be more used for services such as data query and the like.
3. The cross-platform compatibility of the data import service is realized through the platform-independent characteristic of the compiling module, and the cost of issuing a plurality of platform versions by a product is avoided. Meanwhile, the Loader program and the database are decoupled, the Loader program does not need to preset the realization logic of data conversion and distribution of the database, and the problem of unmatched versions in the later period is avoided.
4. The logic code generated by the data conversion code generation module customizes the flow for each table, reduces heavy function call and branch judgment flow in a general conversion mechanism, reduces time slice consumption of hardware equipment for executing calculation and consumption of a function call stack, and improves the performance of data conversion processing.
An embodiment of the present invention further provides a loader, as shown in fig. 4, including:
a data dividing module 401, configured to acquire data to be distributed, and divide the data to be distributed into data records;
a data conversion module 402, configured to analyze the data records obtained by the data segmentation module 401 and convert the data records into target data records, where a format of the target data records is a format that can be recognized by the database system;
a distribution calculating module 403, configured to determine, according to a data distribution policy of the database system, a target data node corresponding to the target data record obtained by the conversion performed by the data converting module 402 in the database system;
a distributing module 403, configured to send the target data record obtained through conversion by the data converting module 402 to the target data node determined by the distribution calculating module 403.
In this embodiment, the data to be distributed is data that needs to be sent to the database system for storage, and the format of the data may be original data or intermediate data processed by the preprocessing device, and a scheme of the intermediate data processed by the preprocessing device is preferably adopted, so that it is possible to avoid that illegal data and data that does not conform to data consistency occupy data processing resources.
In this embodiment, the rule used for dividing the data record may adopt various existing dividing rules, for example: the text data to be distributed is divided into a plurality of data records by identifying line breaks/comma delimiters or the like. Specific segmentation rules embodiments of the present invention are not intended to be limited uniquely.
The starting condition for acquiring and dividing the data to be distributed may be any set starting condition, for example: the implementation of the embodiment of the invention is not affected by the timeout of the timer, the reception of an instruction for executing content distribution, the reception of a data distribution request of a database system and the like; therefore, the embodiment of the present invention does not uniquely limit the starting condition for acquiring and dividing the data to be distributed.
In this embodiment, the database system cannot generally recognize all formats, and therefore, the data records need to be converted to obtain the target data records, so that the target data records can be recognized by the database system. How the loader obtains the format of the target data record may be determined according to different requirements, for example, pre-configuring or receiving a transmission from other devices, which is not limited in the embodiment of the present invention.
In a database system, data distribution is usually not randomly distributed, and a certain distribution strategy is usually adopted for two purposes: the method has the advantages that firstly, the data can be approximately uniformly distributed on each physical server of the distributed system, the data skew is prevented, secondly, the data distribution is optimized for the follow-up common service processing algorithm, and the purpose of improving the query processing performance is achieved. In a large database system, there are many data nodes to perform storage of target data records, and the distribution policy is embodied by storage of different data nodes. For a certain target data record, there will be a unique storage location and therefore also a unique data node as its target data node. The data distribution strategy of the database system can be Hash distribution, Range distribution or copy distribution, and the embodiment of the invention does not make unique regulation on the specific selection of the data distribution strategy.
In the above embodiment, the loader performs data conversion, and the loader has determined the target data node corresponding to the target data record, so after the target data record is sent to the database system, the data node of the database system is no longer required to perform data conversion and redistribution calculation, thereby reducing occupation of resources of the database system; and the intranet resources of the database system are not occupied. Therefore, the scheme can reduce the possibility of overload of the database system, thereby improving the response speed and the storage efficiency of the storage system.
The embodiment of the invention also provides a preferable implementation scheme for sending the target data record in the scheme, which comprises the following steps: further, as shown in fig. 5, the loader further includes:
a storage module 501, configured to store queues corresponding to data nodes in the database system one to one, and therefore also store queues corresponding to the target data nodes;
the distribution calculating module 403 is further configured to store the target data record into a queue corresponding to the target data node;
the distributing module 403 is configured to take out the target data record from the queue of the storage module 501 according to a first-in first-out principle, and send the target data record to a data node corresponding to the queue where the taken out target data record is located, so that the target data record is also sent to the target data node.
The embodiment of the invention is applied to the scene with very large data volume, so that the target data is likely to be congested in the sending process, and in order to reduce the occurrence of the situation and improve the sending efficiency, the embodiment of the invention adopts the scheme and stores the target data nodes through the queues; because the queues correspond to the target data nodes, and a database system has a plurality of data nodes, different target data nodes exist for different target data records, so that a plurality of queues correspond to the target data nodes one by one correspondingly; in this scenario, one queue corresponds to one data node, which avoids resource contention among data nodes and rate limitation using one transmission queue.
In the embodiment of the present invention, in the process of processing data to be distributed, a loader needs to obtain some information that restricts data processing, and the sources and obtaining modes of the information may be different based on different applications, and the embodiment of the present invention provides three types of optional implementation schemes, as follows:
a,
Optionally, the format of the target data record and the data distribution policy of the database system are configured locally in the loader;
the data conversion module 402 is specifically configured to convert the data record into the target data record according to a format of the target data record configured locally;
the distribution calculating module 403 is specifically configured to determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured locally.
In this embodiment, the information for constraining data processing is configured locally in the loader, which is more effective in a dedicated database system, and the information for constraining data processing is more stable in correspondence to the determined database system. The information for restricting the data processing is directly configured at the local part of the loader, so that the required functions can be conveniently realized. In this embodiment, the information for constraining the data processing is not necessarily all configured at the local of the loader, and a part of the information may be configured at the local of the loader, and the other information may be obtained in other manners, which does not affect the implementation of the embodiment of the present invention.
II,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: further, as shown in fig. 6, the loader further includes:
a parameter receiving module 601, configured to receive a format of the target data record and a data distribution policy of the database system before the data conversion module 402 converts the data record into the target data record;
the data conversion module 402 is specifically configured to execute a locally stored logic code according to the format of the target data record received by the parameter receiving module 601, and convert the data record into the target data record by executing the logic code;
the distribution calculating module 403 is specifically configured to execute a locally stored logic code according to the data distribution policy of the database system received by the parameter receiving module 601, and determine a target data node corresponding to the target data recorded in the database system by executing the logic code.
In this embodiment, the logic code is stored locally in the loader, and the constraint condition in the logic code may be assigned by an external device, that is, the logic code may be controlled by a device including a database system, so that the solution of the embodiment of the present invention may be conveniently implemented and compatible with various database systems. After the parameter receiving module receives the format of the target data record and the data distribution policy of the database system, the data conversion module may perform: and assigning the format of the target data record to corresponding variables in the logic codes, and assigning the data distribution strategy of the database system to corresponding variables in the logic codes by a distribution calculation module. The specific use process of the parameter receiving module after receiving the format of the target data record and the data distribution policy of the database system is not limited to an implementation manner of assignment operation, and the above examples should not be construed as the only limitations of the embodiments of the present invention.
III,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: further, as shown in fig. 7, the loader further includes:
a code receiving module 701, configured to receive a logical code before the data conversion module 402 converts the data record into the target data record, where a format of the target data record and a data distribution policy of the database system are specified in the logical code;
the data conversion module 402 is specifically configured to execute the logic code, and convert the data record into the target data record by executing the logic code;
the distribution calculating module 403 is specifically configured to execute the logic code, and determine a target data node corresponding to the target data recorded in the database system by executing the logic code. In this embodiment, the information that constrains the data handling is carried in the form of logical code, and for the loader, the loader obtains the information that constrains the data handling, but may not parse and identify the information. In this embodiment, the logic code does not necessarily include all information for restricting data processing, and may only include a part of the logic code, and the other part is obtained by other methods, which does not affect the implementation of the embodiment of the present invention.
The loader of the embodiment receives the logic code, and the loader does not need to be configured with information for restricting data processing, so that the loader can be conveniently compatible with various database systems. In addition, the loader side does not need to realize a general complex analysis and distribution mechanism, and the logic code is directly generated by a compiling technology (code generation technology) on the database system side, so that the efficiency of executing the logic code on the loader is higher, and the hardware pressure is reduced.
In this embodiment of the present invention, the logic code may be a code that can be directly executed, or may also be a code that needs to be compiled to be executed. The embodiment of the invention can adopt codes irrelevant to a platform, thereby being more convenient for the compatibility between a database system and a loader, and specifically comprising the following steps: the above logical code is platform independent code. In the embodiment of the invention, the sender of the logic code can be a database system and a third-party device except for a loader, or can be a database system, wherein the database system generates the logic code according to the definition of the data table and the internal format requirement, so that the efficiency and the compatibility are higher. Therefore, as a preferred implementation manner of the embodiment of the present invention, the following may be specifically mentioned: optionally, the code receiving module 701 is specifically configured to receive the logic code sent by the database system.
In addition, in the embodiment of the present invention, the local storage system may store data to be distributed to adapt to an application scenario with a large data volume, specifically as follows: further, as shown in fig. 8, the loader further includes: the local storage system 801 is used for storing data to be distributed.
An embodiment of the present invention further provides another loader, as shown in fig. 9, where the loader includes: a receiver 901, a transmitter 902, and a processor 903; the memory 904 is an unnecessary functional component in the present embodiment.
The receiver 901 is configured to obtain data to be distributed;
the processor 903 is configured to divide the data to be distributed into data records; converting the data record into a target data record, and determining a target data node corresponding to the target data record in a database system according to a data distribution strategy of the database system, wherein the format of the target data record is a format which can be identified by the database system;
the transmitter 902 is configured to send the target data record to the target data node.
In this embodiment, the data to be distributed is data that needs to be sent to the database system for storage, and the format of the data may be original data or intermediate data processed by the preprocessing device, and a scheme of the intermediate data processed by the preprocessing device is preferably adopted, so that it is possible to avoid that illegal data and data that does not conform to data consistency occupy data processing resources.
In this embodiment, the rule used for dividing the data record may adopt various existing dividing rules, for example: the text data to be distributed is divided into a plurality of data records by identifying line breaks/comma delimiters or the like. Specific segmentation rules embodiments of the present invention are not intended to be limited uniquely.
The starting condition for acquiring and dividing the data to be distributed may be any set starting condition, for example: the implementation of the embodiment of the invention is not affected by the timeout of the timer, the reception of an instruction for executing content distribution, the reception of a data distribution request of a database system and the like; therefore, the embodiment of the present invention does not uniquely limit the starting condition for acquiring and dividing the data to be distributed.
In this embodiment, the database system cannot generally recognize all formats, and therefore, the data records need to be converted to obtain the target data records, so that the target data records can be recognized by the database system. How the loader obtains the format of the target data record may be determined according to different requirements, for example, pre-configuring or receiving a transmission from other devices, which is not limited in the embodiment of the present invention.
In a database system, data distribution is usually not randomly distributed, and a certain distribution strategy is usually adopted for two purposes: the method has the advantages that firstly, the data can be approximately uniformly distributed on each physical server of the distributed system, the data skew is prevented, secondly, the data distribution is optimized for the follow-up common service processing algorithm, and the purpose of improving the query processing performance is achieved. In a large database system, there are many data nodes to perform storage of target data records, and the distribution policy is embodied by storage of different data nodes. For a certain target data record, there will be a unique storage location and therefore also a unique data node as its target data node. The data distribution strategy of the database system can be Hash distribution, Range distribution or copy distribution, and the embodiment of the invention does not make unique regulation on the specific selection of the data distribution strategy.
In the above embodiment, the loader performs data conversion, and the loader has determined the target data node corresponding to the target data record, so after the target data record is sent to the database system, the data node of the database system is no longer required to perform data conversion and redistribution calculation, thereby reducing occupation of resources of the database system; and the intranet resources of the database system are not occupied. Therefore, the scheme can reduce the possibility of overload of the database system, thereby improving the response speed and the storage efficiency of the storage system.
The embodiment of the invention also provides a preferable implementation scheme for sending the target data record in the scheme, which comprises the following steps:
the processor 903 is specifically configured to store the target data record into a queue corresponding to the target data node, and take out the target data record from the queue according to a first-in first-out principle;
the transmitter 902 is specifically configured to send the retrieved target data record to the target data node.
The embodiment of the invention is applied to the scene with very large data volume, so that the target data is likely to be congested in the sending process, and in order to reduce the occurrence of the situation and improve the sending efficiency, the embodiment of the invention adopts the scheme and stores the target data nodes through the queues; because the queues correspond to the target data nodes, and a database system has a plurality of data nodes, different target data nodes exist for different target data records, so that a plurality of queues correspond to the target data nodes one by one correspondingly; in this scenario, one queue corresponds to one data node, which avoids resource contention among data nodes and rate limitation using one transmission queue.
In the embodiment of the present invention, in the process of processing data to be distributed, a loader needs to obtain some information that restricts data processing, and the sources and obtaining modes of the information may be different based on different applications, and the embodiment of the present invention provides three types of optional implementation schemes, as follows:
a,
As shown in fig. 9, the loader further comprises a memory 904:
the memory 904 is configured to store a format of the target data record and a data distribution policy of the database system;
the processor 903 is specifically configured to convert the data record into the target data record according to the format of the target data record configured locally, and determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured locally.
In this embodiment, the information for constraining data processing is configured locally in the loader, which is more effective in a dedicated database system, and the information for constraining data processing is more stable in correspondence to the determined database system. The information for restricting the data processing is directly configured at the local part of the loader, so that the required functions can be conveniently realized. In this embodiment, the information for constraining the data processing is not necessarily all configured at the local of the loader, and a part of the information may be configured at the local of the loader, and the other information may be obtained in other manners, which does not affect the implementation of the embodiment of the present invention.
II,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: the receiver 901 is further configured to receive a format of the target data record and a data distribution policy of the database system;
the processor 903 is specifically configured to execute the logic code according to the format of the received target data record and the logic code locally stored in the data distribution policy of the database system to convert the data record into the target data record, and determine a target data node corresponding to the target data record in the database system.
In this embodiment, the logic code is stored locally in the loader, and the constraint condition in the logic code may be assigned by an external device, that is, the logic code may be controlled by a device including a database system, so that the solution of the embodiment of the present invention may be conveniently implemented and compatible with various database systems. The processor receives the format of the target data record and the data distribution strategy of the database system; may perform: and assigning the format of the target data record and the data distribution strategy of the database system to corresponding variables in the logic code. The specific use process of the loader after receiving the format of the target data record and the data distribution policy of the database system is not limited to one implementation manner of assignment operation, and the above examples should not be construed as the only limitation to the embodiments of the present invention.
III,
The embodiment of the invention also provides another mode for the loader to obtain the information for restricting the data processing, which comprises the following specific steps: the receiver 901 is further configured to receive a logic code, where the logic code specifies a format of the target data record and a data distribution policy of the database system;
the processor 903 is specifically configured to execute the logic codes, convert the data records into the target data records by executing the logic codes, and determine corresponding target data nodes of the target data records in the database system.
In this embodiment, the information that constrains the data handling is carried in the form of logical code, and for the loader, the loader obtains the information that constrains the data handling, but may not parse and identify the information. In this embodiment, the logic code does not necessarily include all information for restricting data processing, and may only include a part of the logic code, and the other part is obtained by other methods, which does not affect the implementation of the embodiment of the present invention.
The loader of the embodiment receives the logic code, and the loader does not need to be configured with information for restricting data processing, so that the loader can be conveniently compatible with various database systems. In addition, because a general complex analysis and distribution mechanism is not needed to be realized on the loader side, the logic code is directly generated through a compiling technology (code generation technology) on the database system side, so that the efficiency of executing the logic code on the loader is higher, and the hardware pressure is reduced.
In this embodiment of the present invention, the logic code may be a code that can be directly executed, or may also be a code that needs to be compiled to be executed. The embodiment of the invention can adopt codes irrelevant to a platform, thereby being more convenient for the compatibility between a database system and a loader, and specifically comprising the following steps: the above logical code is platform independent code.
In the embodiment of the invention, the sender of the logic code can be a database system and a third-party device except for a loader, or can be a database system, wherein the database system generates the logic code according to the definition of the data table and the internal format requirement, so that the efficiency and the compatibility are higher. Therefore, as a preferred implementation manner of the embodiment of the present invention, the following may be specifically mentioned: the receiver 901 is specifically configured to receive the logic code sent by the database system.
An embodiment of the present invention further provides a storage system, as shown in fig. 10, including:
a loader 1001 and a database system 1002 communicably connected; the loader 1001 is a loader according to any one of the embodiments of the present invention.
In the above embodiment, the loader performs data conversion, and the loader has determined the target data node corresponding to the target data record, so after the target data record is sent to the database system, the data node of the database system is no longer required to perform data conversion and redistribution calculation, thereby reducing occupation of resources of the database system; and the intranet resources of the database system are not occupied. Therefore, the scheme can reduce the possibility of overload of the database system, thereby improving the response speed and the storage efficiency of the storage system.
Alternatively, if the loader 1001 is a loader having a function of receiving logic codes;
the database system 1002 is configured to send a format that can be recognized by the database system 1002 and a data distribution policy of the database system 1002 to the loader 1001; alternatively, the first and second electrodes may be,
the database system 1002 is configured to transmit a logical code to the loader 1001, where the logical code specifies a format that the database system 1002 can recognize and a data distribution policy of the database system 1002.
The loader of the embodiment can locally store the logic code, and the constraint condition in the logic code can be assigned by the external device, that is, the logic code can be controlled by the device including the database system, so that the scheme of the embodiment of the invention can be conveniently realized and various different database systems can be compatible. After receiving the format of the target data record and the data distribution strategy of the database system, the loader; may perform: and assigning the format of the target data record and the data distribution strategy of the database system to corresponding variables in the logic code. The specific use process of the loader after receiving the format of the target data record and the data distribution policy of the database system is not limited to one implementation manner of assignment operation, and the above examples should not be construed as the only limitation to the embodiments of the present invention.
In addition, the loader of the embodiment can also receive the logic codes sent by the database system, and the loader does not need to configure information for restricting data processing, so that the loader can be conveniently compatible with various database systems. In addition, because a general complex analysis and distribution mechanism is not needed to be realized on the loader side, the logic code is directly generated through a compiling technology (code generation technology) on the database system side, so that the efficiency of executing the logic code on the loader is higher, and the hardware pressure is reduced.
Further, the preprocessing system may be used as a part of a storage system for networking, as shown in fig. 11, the storage system further includes:
a preprocessing system 1101 configured to preprocess the raw data produced by the raw data production system to obtain intermediate data, and send the intermediate data to the loader 1001 as data to be distributed;
further, the production system may also be used as a part of a storage system for networking, as shown in fig. 12, the storage system further includes:
a production system 1201 for generating raw data and sending the raw data to the preprocessing system 1101;
the preprocessing system 1101 is configured to preprocess the raw data to obtain intermediate data, and send the intermediate data to the loader 1001 as the data to be distributed.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The data distribution method, the loader and the storage system provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (23)

  1. A data distribution method is applied to a storage system, wherein the storage system comprises a loader and a database system, and the method is characterized by comprising the following steps:
    the loader acquires data to be distributed and divides the data to be distributed into data records;
    the loader converts the data records into target data records, and determines corresponding target data nodes of the target data records in the database system according to a data distribution strategy of the database system, wherein the format of the target data records is a format which can be identified by the database system;
    the loader sends the target data record to the target data node.
  2. The method of claim 1,
    the format of the target data record and the data distribution strategy of the database system are configured in the local part of the loader;
    the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
    and the loader converts the data records into the target data records according to the format of the target data records configured in the local, and determines the corresponding target data nodes of the target data records in the database system according to the data distribution strategy of the database system configured in the local.
  3. The method of claim 1, wherein before the loader converts the data record into the target data record, further comprising:
    the loader receives the format of the target data record and the data distribution strategy of the database system;
    the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
    and the loader executes locally stored logic codes according to the received format of the target data record and the data distribution strategy of the database system, converts the data record into the target data record by executing the logic codes, and determines a corresponding target data node of the target data record in the database system.
  4. The method of claim 1, wherein before the loader converts the data record into the target data record, further comprising:
    the loader receives logic code, and the format of the target data record and the data distribution strategy of the database system are specified in the logic code;
    the loader converts the data record into the target data record, and determines a target data node corresponding to the target data record in the database system according to a data distribution strategy of the database system, including:
    and the loader executes the logic codes, converts the data records into the target data records by executing the logic codes, and determines corresponding target data nodes of the target data records in the database system.
  5. The method of claim 4, wherein the logical code is platform independent code.
  6. The method of claim 4, wherein the loader receiving the logic code comprises: and the loader receives the logic codes transmitted by the database system.
  7. The method of any of claims 1 to 6, wherein the sending, by the loader, the target data record to the target data node comprises:
    and the loader stores the target data record into a queue corresponding to the target data node, and takes out the target data record from the queue according to a first-in first-out principle and sends the target data record to the target data node.
  8. A loader, characterized in that it comprises:
    the data dividing module is used for acquiring data to be distributed and dividing the data to be distributed into data records;
    the data conversion module is used for converting the data records obtained by the data segmentation module into target data records, and the format of the target data records is a format which can be identified by a database system;
    the distribution calculation module is used for determining a corresponding target data node of the target data record obtained by conversion of the data conversion module in the database system according to a data distribution strategy of the database system;
    and the distribution module is used for sending the target data record converted by the data conversion module to the target data node determined by the distribution calculation module.
  9. The loader of claim 8,
    the format of the target data record and the data distribution strategy of the database system are configured in the local part of the loader;
    the data conversion module is specifically configured to convert the data record into the target data record according to a format of the target data record configured locally;
    the distribution calculation module is specifically configured to determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured locally.
  10. The loader of claim 8, further comprising:
    the parameter receiving module is used for receiving the format of the target data record and the data distribution strategy of the database system before the data conversion module converts the data record into the target data record;
    the data conversion module is specifically configured to execute a locally stored logic code according to the format of the target data record received by the parameter receiving module, and convert the data record into the target data record by executing the logic code;
    the distribution calculation module is specifically configured to execute the locally stored logic code according to the data distribution policy of the database system received by the parameter receiving module, and determine a target data node corresponding to the target data record in the database system by executing the logic code.
  11. The loader of claim 8, further comprising:
    a code receiving module, configured to receive logic code before the data conversion module converts the data record into the target data record, where a format of the target data record and a data distribution policy of the database system are specified in the logic code;
    the data conversion module is specifically configured to execute the logic code, and convert the data record into the target data record by executing the logic code;
    the distribution calculation module is specifically configured to execute the logic code, and determine a target data node corresponding to the target data record in the database system by executing the logic code.
  12. The loader of claim 11, in which the logic code is platform independent code.
  13. The loader of claim 11,
    the code receiving module is specifically configured to receive the logic code sent by the database system.
  14. The loader of any one of claims 8 to 13, further comprising:
    the storage module is used for storing the queue corresponding to the target data node;
    the distribution calculation module is also used for storing the target data record into a queue corresponding to the target data node;
    the distribution module is specifically configured to take out the target data record from the queue stored in the storage module according to a first-in first-out principle, and send the target data record to the target data node.
  15. A storage system, comprising:
    a loader and database system communicatively coupled; the loader is as claimed in any one of claims 8 to 14.
  16. The storage system according to claim 15,
    if the loader is the loader of claim 10;
    the database system is used for sending the format of the target data record and the data distribution strategy of the database system to the loader;
    if the loader is the loader of claim 11 or 12;
    and the database system is used for sending the logic code to the loader, and the format of the target data record and the data distribution strategy of the database system are specified in the logic code.
  17. The storage system according to claim 15 or 16, further comprising:
    the production system is used for generating original data and sending the original data to the preprocessing system;
    the preprocessing system is used for preprocessing the original data to obtain intermediate data, and sending the intermediate data to the loader as the data to be distributed.
  18. A loader, comprising: a receiver, a transmitter, and a processor, wherein,
    the receiver is used for acquiring data to be distributed;
    the processor is used for dividing the data to be distributed into data records; converting the data record into a target data record, and determining a target data node corresponding to the target data record in a database system according to a data distribution strategy of the database system, wherein the format of the target data record is a format which can be identified by the database system;
    the transmitter is configured to send the target data record to the target data node.
  19. The loader of claim 18, further comprising a memory:
    the memory is used for storing the format of the target data record and the data distribution strategy of the database system;
    the processor is specifically configured to convert the data record into the target data record according to a format of the target data record configured in a local area, and determine a target data node corresponding to the target data record in the database system according to a data distribution policy of the database system configured in the local area.
  20. The loader of claim 18,
    the receiver is further used for receiving the format of the target data record and the data distribution strategy of the database system;
    the processor is specifically configured to execute a logic code locally stored according to the received format of the target data record and the data distribution policy of the database system, convert the data record into the target data record by executing the logic code, and determine a target data node corresponding to the target data record in the database system.
  21. The loader of claim 18,
    the receiver is further configured to receive logic code, where the logic code specifies a format of the target data record and a data distribution policy of the database system;
    the processor is specifically configured to execute the logic code, convert the data record into the target data record by executing the logic code, and determine a target data node corresponding to the target data record in the database system.
  22. The loader of claim 20,
    the receiver is specifically configured to receive the logic code sent by the database system.
  23. The loader of any one of claims 18 to 22,
    the processor is specifically configured to store the target data record into a queue corresponding to the target data node, and take out the target data record from the queue according to a first-in first-out principle;
    and the transmitter is specifically configured to send the retrieved target data record to the target data node.
CN201480029493.XA 2014-11-05 2014-11-05 A kind of data distributing method, loading machine and storage system Active CN105765569B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/090359 WO2016070364A1 (en) 2014-11-05 2014-11-05 Data distribution method, loader and storage system

Publications (2)

Publication Number Publication Date
CN105765569A true CN105765569A (en) 2016-07-13
CN105765569B CN105765569B (en) 2018-02-02

Family

ID=55908382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480029493.XA Active CN105765569B (en) 2014-11-05 2014-11-05 A kind of data distributing method, loading machine and storage system

Country Status (2)

Country Link
CN (1) CN105765569B (en)
WO (1) WO2016070364A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804296A (en) * 2018-05-21 2018-11-13 上海星佑网络科技有限公司 Record destructing method and apparatus and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638584A (en) * 2012-04-20 2012-08-15 青岛海信传媒网络技术有限公司 Data distributing and caching method and data distributing and caching system
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN104090765A (en) * 2014-07-16 2014-10-08 福建天晴数码有限公司 Method and device for switching from mobile game to webgame

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610348B2 (en) * 2003-05-07 2009-10-27 International Business Machines Distributed file serving architecture system with metadata storage virtualization and data access at the data server connection speed
CN102750350B (en) * 2012-06-08 2015-04-22 北京天地云箱科技有限公司 Monitoring system and method
US9183268B2 (en) * 2013-04-11 2015-11-10 Pivotal Software, Inc. Partition level backup and restore of a massively parallel processing database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638584A (en) * 2012-04-20 2012-08-15 青岛海信传媒网络技术有限公司 Data distributing and caching method and data distributing and caching system
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN104090765A (en) * 2014-07-16 2014-10-08 福建天晴数码有限公司 Method and device for switching from mobile game to webgame

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804296A (en) * 2018-05-21 2018-11-13 上海星佑网络科技有限公司 Record destructing method and apparatus and computer readable storage medium
CN108804296B (en) * 2018-05-21 2021-09-24 上海星佑网络科技有限公司 Record deconstruction method and apparatus and computer readable storage medium

Also Published As

Publication number Publication date
CN105765569B (en) 2018-02-02
WO2016070364A1 (en) 2016-05-12

Similar Documents

Publication Publication Date Title
US11582123B2 (en) Distribution of data packets with non-linear delay
US11182098B2 (en) Optimization for real-time, parallel execution of models for extracting high-value information from data streams
US10831562B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
US10560465B2 (en) Real time anomaly detection for data streams
CN108009236B (en) Big data query method, system, computer and storage medium
US11868359B2 (en) Dynamically assigning queries to secondary query processing resources
US20210279265A1 (en) Optimization for Real-Time, Parallel Execution of Models for Extracting High-Value Information from Data Streams
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
CN112130996A (en) Data monitoring control system, method and device, electronic equipment and storage medium
US10489179B1 (en) Virtual machine instance data aggregation based on work definition metadata
CN110134738A (en) Distributed memory system resource predictor method, device
CN105765569A (en) Data distribution method, loader and storage system
EP3380906A1 (en) Optimization for real-time, parallel execution of models for extracting high-value information from data streams
CN113590322A (en) Data processing method and device
CN117555693A (en) Method, equipment and medium for distributing payment service of enterprise judicial library
Technolgy Clasifcaton Technolgy
CN115982160A (en) Data processing method, server, electronic device, and computer storage medium
Suresh Kumar Honed resource segregation in Cloud, Fog and Edge computing using data consumption Churn
CN113760942A (en) Data processing method and device in interactive analysis
Souza Junior A data driven dispatcher for big data applications in heterogeneous systems
CN117633116A (en) Data synchronization method, device, electronic equipment and storage medium
CN117994040A (en) Method and device for processing wind control rule calculation service in real time
CN115686807A (en) Data processing method and system, mobile terminal, electronic device and storage medium
CN112817799A (en) Method and device for accessing multiple data sources based on Spring framework
CN113141403A (en) Log transmission method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant