CN110888925A - Data loading and distributing method and device and storage medium - Google Patents

Data loading and distributing method and device and storage medium Download PDF

Info

Publication number
CN110888925A
CN110888925A CN201910963378.4A CN201910963378A CN110888925A CN 110888925 A CN110888925 A CN 110888925A CN 201910963378 A CN201910963378 A CN 201910963378A CN 110888925 A CN110888925 A CN 110888925A
Authority
CN
China
Prior art keywords
loading
data
task
sorting
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910963378.4A
Other languages
Chinese (zh)
Other versions
CN110888925B (en
Inventor
易丙洪
陈威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Climate Agricultural Science And Technology Co Ltd
Original Assignee
Guangzhou Climate Agricultural Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Climate Agricultural Science And Technology Co Ltd filed Critical Guangzhou Climate Agricultural Science And Technology Co Ltd
Priority to CN201910963378.4A priority Critical patent/CN110888925B/en
Publication of CN110888925A publication Critical patent/CN110888925A/en
Application granted granted Critical
Publication of CN110888925B publication Critical patent/CN110888925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method, a device and a storage medium for loading and distributing data, wherein the device comprises a control unit, a loading unit and a distributing unit; in addition, compared with the prior art, the method and the device reduce the preposed steps of data processing, and enable related personnel to spend most of energy on the subsequent data processing to quickly finish data loading and distribution; the lightweight device of the invention can be deployed flexibly. The method, the device and the storage medium for loading and distributing the data can be widely applied to the technical field of data processing.

Description

Data loading and distributing method and device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a storage medium for loading and distributing data.
Background
Generally, the first step of the data processing flow is to load data to be processed from a data source and then distribute the loaded data to a processing unit for processing. The data loading mode generally has the following 3 types: a primary loading mode, a cyclic loading mode, and a continuous incremental loading mode; data distribution distributes the data to different destinations in different ways, for example to an HTTP server via the HTTP protocol or to a message queue via the TCP protocol.
At present, there are two ways to solve the above-mentioned data loading and distributing requirements, one is to develop a data loading and distributing device by itself, and the other is to use the existing third party data integration tool.
In actual operation, the adopted modes and tools can maximize the development efficiency only if the time and human resource conditions of projects and teams can be adapted. For medium and large-scale teams, a special group is responsible for developing and maintaining the device or a set of third-party data integration tools for operation and maintenance by self; for a small team, in the case of limited manpower and insufficient time, the following problems are encountered by adopting the two methods:
the data and distribution device is developed by self aiming at a specific project requirement, so that the device is easily coupled with the specific details of the project excessively, and the reuse is not facilitated. On one hand, the learning cost of the team is increased by using the existing third-party data integration tool; on one hand, the risk that the abnormal condition cannot be processed in time is increased due to the high encapsulation of the third-party tool; in addition, if the user finds that the method provided by the tool cannot meet some details of the current demand, at this time, either the demand is compromised to adapt to the method provided by the tool, or secondary development is performed based on the tool at a high learning cost.
Therefore, for a small team with insufficient time, the study cost, the operation simplicity and the flexibility of the method and the device need to be considered heavily.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a storage medium for loading and distributing data, which are low in learning cost, easy to operate, and high in flexibility.
In a first aspect, an embodiment of the present invention provides a method for loading and distributing data, including the following steps:
reading the configuration information of the current task from the pre-stored configuration information;
after the task is initialized through the configuration information, the state information of the task is saved;
calculating required parameters for the loading, and loading data from a data source according to the parameters; wherein the parameter is a value screening range of the sequencing field;
distributing the loaded data to a destination according to the configuration information;
recalculating and storing the state information of the task;
determining the number of the current loading records according to the state information of the task;
and determining the execution progress of data loading and distribution according to the number of the current loading records, and finishing the final data loading and distribution operation.
Further, the step of determining the execution progress of data loading and distribution according to the number of the current loading records to complete the final data loading and distribution operation includes the following steps:
when the number of the current loading records is equal to the single maximum loading number, carrying out next loading after waiting for a loading time interval;
when the number of the current loading records is less than the single maximum loading number:
if the task adopts a single-round loading mode, updating the state of the saved task to be 'finished', and ending the task; if the task adopts a continuous increment loading mode, waiting for the time interval of the turn, and then carrying out next loading; and if the task adopts the cyclic loading mode, resetting the task state and starting the next round of loading.
Further, the configuration information of the task includes:
data source connection parameters for configuring parameters required for connecting to a data source;
the paging size is used for controlling the maximum record number of single loading;
the data table is used for appointing the position of the data to be loaded in a specific data set in the target data source;
the sequencing field names are used for selecting a field in the data table as the basis of the loading traversal sequence according to the characteristics and the service characteristics of the data to be loaded;
the sorting sequence is used for appointing a sorting sequence based on the sorting field for the loading traversal, and the sorting sequence comprises ascending sorting and descending sorting;
a sorting field value range minimum function used for calculating the minimum value of the sorting field value range;
the maximum function of the sorting field value range is used for calculating the maximum value of the sorting field value range;
a load time interval specifying a time interval between each load;
a round time interval for specifying a time interval between each round of loading;
a distribution mode for selecting a corresponding provider from the SPI providers to provide a distribution service;
a distribution destination for determining a destination of the distribution.
Further, the state information of the task includes:
a lock identifier for being a credential of a thread processing a task, each task in the same time being executable by only one thread;
a lock timeout time to indicate an expiration time of the lock;
sort field range minimum for: if the sorting sequence is ascending sorting, the minimum value of the sorting field range represents the minimum value of the sorting field value range used in next loading; if the sorting sequence is descending sorting, the minimum value of the sorting field range has no practical meaning;
sorting field range maximum value, if the sorting sequence is descending sorting, the sorting field range maximum value represents the maximum value of the sorting field value range used in next loading; if the sorting sequence is ascending sorting, the maximum value of the sorting field range has no practical meaning;
the value of the sorting field in the record with the most back sorting in the last loading result is used for representing the value of the sorting field in the record with the most back sorting in the last loading result, and the value is an end point of the next loading data screening range;
the number of records with the value of the sorting field in the last loading result equal to the value of the sorting field in the record with the most back sorting in the last loading result;
the total number of the records loaded last time is used for representing the total number of the records loaded last time;
the number of records needing to be skipped in the next screening result is used for indicating how many records needing to be skipped in the records meeting the screening condition next time so as to avoid repeatedly loading the loaded records when the values of the sequencing fields of the plurality of records are the same;
and the task running state is used for representing the running state of the task.
Further, the data loading and distributing includes three modes, specifically including:
the single-round loading mode is used for carrying out one round of traversal according to the specified data screening range, and when no more data exists in the traversal, the task is immediately ended;
the cyclic loading mode is used for traversing according to the specified data screening range, when no more data exists in the traversal, the traversal of the current round is ended, the traversal sites are reset, a new round of traversal is performed, and the process is repeated continuously;
and the continuous increment loading mode is used for traversing according to the specified data range, and when no more data exists in the traversal mode, the data loading is continued under the condition that the traversal site is not reset.
In a second aspect, an embodiment of the present invention further provides a data loading and distributing apparatus, including:
the control unit is used for creating and controlling the starting and stopping of the data loading and distributing task;
the loading unit is used for reading the configuration information of the current task from the pre-stored configuration information; after the task is initialized through the configuration information, the state information of the task is saved; calculating required parameters for the loading, and loading data from a data source according to the parameters; providing the loaded data to the distribution unit according to the configuration information;
a distribution unit configured to distribute the loaded data to a destination according to the configuration information; and determining the execution progress of data loading and distribution according to the number of the current loading records, and finishing the final data loading and distribution operation.
Further, the control unit includes:
the configuration manager is used for managing the configuration information of the task;
the configuration information storage is used for storing the configuration information of the data loading and distributing task;
the task controller is used for creating a task according to the configuration information and controlling the starting and stopping of the task;
wherein the tasks are created by the task controller for executing a specified load and dispatch task, each task controlling a load unit and a dispatch unit for handling the loading and dispatch of data.
Further, the loading unit includes:
the loading controller is used for realizing the control logic of the data loading method, loading data from a bottom layer data source through the data loading SPI and transmitting the data to the distribution unit for distribution;
and the data loading SPI is used for providing a general data batch loading mode.
Further, the distribution unit includes:
a distribution controller for receiving the data from the loading unit and distributing the data to a designated data consumption destination through the data distribution SPI;
and the data distribution SPI is used for providing a universal data distribution mode.
In a third aspect, an embodiment of the present invention further provides a storage medium, in which processor-executable instructions are stored, and when executed by a processor, the processor-executable instructions are configured to perform the method for load distributing data.
One or more of the above-described embodiments of the present invention have the following advantages: according to the invention, data loading and distributing operations are realized through the control unit, the loading unit and the distributing unit, and a specific data loading and distributing method is packaged in the device; in addition, compared with the prior art, the method and the device reduce the preposed steps of data processing, and enable related personnel to spend most of energy on the subsequent data processing to quickly finish data loading and distribution; the lightweight device of the invention can be deployed flexibly.
Drawings
FIG. 1 is a schematic structural diagram of an apparatus according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the internal structure of the apparatus according to the embodiment of the present invention;
FIG. 3 is a flow chart of method steps in an embodiment of the present invention.
Detailed Description
The invention will be further explained and explained with reference to the drawings and the embodiments in the description. The step numbers in the embodiments of the present invention are set for convenience of illustration only, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.
The invention provides a method and a device for loading and distributing data, which aim to solve the problems of learning cost, operation advisability and flexibility of the method and the device for loading and distributing data. The following is a detailed description of specific implementations of the method and apparatus of the present invention.
As shown in fig. 1, the apparatus as a whole includes three component units of a control unit 101, a loading unit 102, and a dispensing unit 103. The control unit is responsible for creating and controlling the starting and stopping of data loading and distributing tasks; the loading unit is responsible for loading data from the data source 104 and handing the loaded data to the distribution unit; the distribution unit is responsible for distributing the data in a specified manner to a specified data consumption destination, which may be an external data consumption system, or an in-application integrated data processing unit, such as HHTP server 105, message queue 106, in-application integrated processing unit 107, and other distribution destinations 108.
The internal structure of the device is shown in fig. 2, and the device is composed of a plurality of working modules cooperating with each other and a group of Service Provider Interfaces (SPI).
The control unit is made up of a configuration manager 201, a configuration information store 202, a task controller 203, tasks 204 and a task state information store 205. Wherein:
1) the configuration manager is used for managing the configuration information of the tasks, and the configuration information storage is responsible for storing the task configuration information;
the configuration information includes a load configuration and a dispatch configuration for specifying the behavior of the load unit and dispatch unit, respectively. Details of the configuration information are given in the following paragraphs for detailed description of the methods.
2) And the task controller is used for creating the task according to the configuration information and is responsible for starting and stopping the task.
3) Tasks, created by the task controller, for performing a specified load and dispatch task; each task controls a loading unit and a distribution unit, and is respectively responsible for loading and distributing data. The state information of the task, namely the data loading progress and the position point, is stored in the task state information storage, so that the task can be suspended or can be continuously loaded from the position point before exiting after the abnormal termination without starting from the beginning; details of task state information are described in the following paragraphs for detailed description of methods.
The loading unit is composed of a loading controller 206, a data loading service providing interface 207(SPI), and SPI implementation for different data sources, such as a relational database 211, a non-relational database 212, and other databases 210, wherein the relational database loading server provider 208 and the non-relational database loading server provider 209 provide corresponding database implementation interfaces, respectively. Wherein:
1) the loading controller is used as a core component of the loading unit and used for realizing the control logic of the data loading method, loading data from a bottom layer data source through the data loading SPI and transmitting the data to the distribution unit for distribution;
2) the data loading SPI abstracts a general data batch loading mode, and can be used by a loading unit as long as a service provider realizes the SPI. For example, according to the query condition and parameter required by the SPI, an SQL query statement for the relational database is implemented, that is, the service for loading data from the relational database is provided for the loading unit.
The distribution unit is realized by a distribution controller 213, a data distribution service providing interface 214(SPI), and SPIs for different distribution modes and destinations, and the distribution destinations include, for example, an HTTP server 219, a message queue 220, a data processing unit 221 integrated in an application, and other distribution destinations 218, where the HTTP distribution service provider 215 distributes data by an HTTP protocol, the message queue distribution service provider 216 distributes data by a message queue protocol, and the in-application distribution service provider 217 distributes data by an in-application message protocol. Wherein:
1) a distribution controller for receiving the data from the loading unit and distributing the data to a designated data consumption destination through the SPI;
2) the data distribution SPI abstracts a general data distribution method, and can be used by the distribution unit as long as the service provider implements the SPI. For example, the distribution unit may be provided with a service for distributing data to a remote HTTP server by means of the HTTP protocol. In addition to being able to distribute data to a remote data consumption system for processing, the SPI approach means that the device supports secondary development, i.e. integrating the data processing unit directly into the application, which is very useful in a fast development and testing scenario.
The components of the device are labor-sharing and have clear cooperation relationship, so that the learning cost and the use difficulty are reduced. Meanwhile, the device provides the possibility of supporting various different data sources for the loading function and the possibility of supporting various different distribution modes and destinations for the distribution function in an SPI mode, and the flexibility of data loading and distribution is improved.
The implementation of the above device requires a reasonable method to be abstracted from the data loading and distributing process. On one hand, the flow of data loading is packaged in the device, so that the data processing task does not need to concern about the process of acquiring the data to be processed but only concerns about the processing of the data; on the other hand, the abstraction needs to be generic enough to accommodate different types of data sources and distribution. The present invention provides just one such method. Fig. 3 depicts the flow of the method, which is described in detail below.
The method has 3 modes of data loading, one of which can be adopted for each task. The 3 modes include:
1) single wheel load mode
In the mode, a round of traversal is performed according to the specified data screening range, and when no more data exists in the traversal, the task is immediately ended. The mode is particularly suitable for the scene of loading static data.
2) Circular loading mode
In the mode, traversal is carried out according to a specified data screening range, when no more data exists in the traversal, the traversal of the current round is ended, traversal sites are reset, a new round of traversal is carried out, and the process is repeated continuously. This mode is suitable for use in scenarios where data needs to be consumed cyclically.
3) Continuous incremental load mode
In the mode, traversal is performed according to a specified data range, and when no more data exists in the traversal, data loading is continued under the condition that traversal sites are not reset. The schema is suitable for incremental loading of data as data dynamically increases in the data source.
The method of the invention defines the configuration information needed by the data loading and distributing task, and the configuration information items comprise:
1) data source connection parameters
The parameters needed for configuring the device to connect to the data source are provided by the data loading service provider SPI. For example, for services that provide relational database data loading, parameters that are typically required include: host, port, database name, username, password, code, time zone.
2) Size of page
For controlling the maximum number of records per load, this embodiment is referred to as size.
3) Data sheet
For specifying which data set (e.g., a table in a relational database) of the target data source the data to be loaded is located on.
4) Sort field names
In order to use the method, a field is selected in the data table as the basis of the loading traversal sequence according to the characteristics and the service characteristics of the data to be loaded. For example, a monotonically increasing/decreasing field is used as the sorting field.
5) Sort order
A sort order based on sort fields is specified for the load traversal, i.e., whether sorting is in ascending order according to the sort fields or in descending order according to the sort fields is specified.
6) Minimum function of sorting field value range
Indicating a minimum value for calculating the range of values for the ordering field. For example, the sort field is record creation time, and the user may specify the minimum value of the range with the function as long as he wants to load records whose creation time is within a certain range. Note that the parameter is a function, not a value. The load controller will evaluate using the function with the current state as a parameter before each load. This also means that the filtering range of each load can be fixed or can be changed according to the context (for example, a sliding window can be implemented to filter data). As a screening condition for this loading, in this embodiment, fn _ rmn (t) is used to refer to the function, where t represents a context composed of a current task configuration and a current task state.
7) Maximum function of sorting field value range
The parameter is similar to the parameter (6) and represents a maximum value used for calculating the value range of the sorting field. This embodiment refers to this function with fn _ rmx (t), where t represents the context consisting of the current task configuration and the current task state.
8) Loading time intervals
Specifying the time interval between each load can be used to control the rate of the load. This embodiment refers to this parameter by lwt.
9) Time interval of round
For specifying the time interval between each round of loading, this parameter is only valid in cyclic loading mode. This embodiment refers to this parameter by rwt.
10) Distribution mode
This parameter is used by the distribution controller to select a corresponding provider from the SPI providers to provide the distribution service.
11) Distribution destination
This parameter is interpreted and used by the SPI provider selected by the distribution controller to determine the destination of the distribution.
Further as a preferred embodiment, the method of the present invention further defines status information required for data loading, and the status information item includes:
1) lock label (ID)
As a credential for the thread that processes the task, one task can be executed by only one thread at a time. A task can be locked by a new lock ID only if it is not locked by any thread or the existing lock has timed out; the timeout time for a lock may be extended by the thread holding the lock. A thread can execute a task only after successfully locking a task; in the execution process, when the task state is updated after loading is finished, the timeout time of the lock is prolonged, and the thread is in an active state as a heartbeat. Before a thread executes a task, a lock ID which is different every time and is globally unique is generated (the method adopts UUID), and the lock ID of the task is tried to be updated according to the condition of updating the lock ID, wherein the successful updating indicates that the current thread obtains the qualification of processing the task.
2) Lock timeout time
Indicating the lock expiration time.
3) Minimum value of ordered field range
If the sorting order is ascending, the field represents the minimum value of the sorting field value range used in the next loading. The state item is assigned as an evaluation result of the fn _ rmn (t) function in the task initialization stage; in the subsequent loading process, the value is assigned as the value of the sorting field of the data which is sorted most backward in the current loading result after each loading is finished, and is also used as the minimum value of the range of the next loading. This embodiment refers to this term as rmn.
If the sort order is descending, then the field has no actual meaning.
4) Maximum value of ordering field range
If the sorting order is descending, the field represents the maximum value of the sorting field value range used in next loading. The state item is copied as the evaluation result of the fn _ rmx (t) function in the task initialization stage; in the subsequent loading process, the value is assigned as the value of the sorting field of the data which is sorted most backward in the current loading result after each loading is finished, and is also used as the maximum value of the range of the next loading. This embodiment is referred to as rmx.
If the sort order is ascending, then this field has no actual meaning.
5) Value of sorting field in last sorted record in loading result
The status entry represents the value of the sort field in the last-ranked record in the last-loaded result, which is an end point of the next-loaded data screening range, and is denoted by lfv in this embodiment.
6) The number of records with the sorting field value of lfv in the last loading result
The status entry indicates the number of records in the last load result for which the sorted field value equals lfv. The method allows to specify a field without unique constraints as an ordering field, by means of which a status item (8) is calculated. This term is denoted by lfvc in this embodiment.
7) Total number of records last loaded
Represents the total number of records loaded last time, and this embodiment refers to this item by size.
8) Number of records to be skipped for next screening result
Indicating how many records need to be skipped in the records satisfying the filtering condition next time to avoid repeatedly loading the loaded records when the sorting fields of the plurality of records take the same value. This embodiment refers to this item with skip.
The calculation method of the term is as follows: when lfv of the last loading result is equal to lfv of the current loading result, taking the sum of skip used by the current loading and lfv of the current loading result; otherwise, lfv of this load is fetched.
9) Task running state
For representing the running state of the task, the method defines 2 running states, which are respectively: in operation, this is complete. This embodiment refers to this item by status.
The method adopts a relational database as configuration state storage and task state storage.
As shown in fig. 3, the execution flow of the method includes the following steps:
301) starting a data loading and distributing task;
303) reading task configuration information from a configuration information store 302 (the specific reading content is shown as 304 in fig. 3);
305) after initialization using the configuration information, the task state information is saved (the content of the specific task state is shown as 306 in fig. 3), and the created record is stored in the state information storage 307.
308) Calculating the required parameters for this loading and using the parameters to load a batch of data items from the target data source 309, as shown at 310 in fig. 3;
the parameters to be calculated are the sorting field value screening range, and if ascending sorting is used, the value range is [ rmn, fn _ rmx (t) ]; if descending sorting is used, the value range is [ fn _ rmn (t), rmx ]
311) Distributing the data to the destination 312 according to a distribution configuration;
313) recalculate and save the task state information to state information store 307;
recording the record set loaded at this time as items, the number of the loaded records as items _ size, the last-ranked record in items as li, the ranking field value of li as lisv, and the number of records in items whose ranking field value is equal to lisv as lisvc, then the new task state items need to be calculated as follows:
1) lock timeout time: prolonging the time of one lock;
2) skip (used for next load): when lfv is equal to lisv, taking skip + lisvc used by the loading;
otherwise, taking lisvc;
3) rmn: if ascending sorting is used, taking lisv; if descending sorting is used, taking fn _ rmn (t);
4) rmx: if ascending sorting is used, taking fn _ rmx (t); if sorting in descending order is used, taking lisv;
5) lfv: taking lisv;
6) size: taking item _ size;
314) if the number of the current loading records is greater than or equal to the single maximum loading number (size), after waiting for the loading interval 315, further executing step 306;
if the number of records of the current load is less than the single maximum load number (size), the temporary absence of more data is indicated. At this time, the following judgment is made according to the selected loading mode:
317) if the task mode is the single-round loading mode 316, updating the task saving state to be 'completed', and ending the task;
321) if the task mode is the continuous increment loading mode, waiting rwt time, and returning to step 308 for next loading;
319) if the task mode is the cyclic loading mode, the task state is reset in the manner as 305 in fig. 3, and then after waiting rwt time, the next round of first loading is performed by returning to step 308;
315) if the number of records of the current load is equal to size, wait lwt time, and go back to step 308 for the next load.
The method and apparatus of the present invention have been described in detail above with reference to the accompanying drawings. Under various data processing scenes, by adopting the device and the method, one data loading and distributing task can be quickly started by simply configuring the task parameters and starting the task, so that the development efficiency of the data processing task is greatly improved; the device of the invention adopts an SPI mode, so that the method of the invention is not limited to a specific data source and a distribution mode, and the flexibility of data loading and distribution is improved.
In summary, the data loading and distributing method, device and storage medium of the present invention are used to load data from a database and distribute the data to a data processing unit for processing. The method comprises the steps of configuring a loading unit and a distribution unit for a data loading and distribution task, starting the task after configuration is completed, and starting the device to run; the loading unit loads data in batches in pages from a data source table of a specified data source base through a specified range selector, a sorting field, a sorting sequence and a paging parameter according to the configuration and distributes the data to a specified destination in a configuration specified mode through the distribution unit. The data loading mode comprises a single-wheel loading mode, a cyclic loading mode and a continuous increment mode to meet the requirements in different scenes.
The device can be independently deployed to serve as a data loading and distributing unit in the whole data processing device, and can be directly integrated with the data processing unit to serve as a complete data processing device with data loading, distributing and processing functions to be deployed so as to meet the requirements on simplicity and flexibility of development and deployment in a limited resource environment or a test environment.
The method and the device can improve the simplicity and the flexibility of the data loading and distributing stage in the data processing task, thereby improving the development efficiency of the data processing task.
The invention discloses a data loading and distributing method and device, belonging to light-weight technology and tools. The invention provides a universal data loading and distributing method and a universal data loading and distributing device, which support various data loading modes, support various data sources and are expandable, support various distribution targets and are expandable, thereby reducing the preposed step of data processing, namely the repeated work of data loading and distribution, leading related personnel to quickly complete the step of data loading and distribution, and spending most of energy on the subsequent data processing; on the other hand, the method and the device are focused on data loading and distribution and are decoupled from data processing, so that a developer can develop a data processing program by using a familiar technology, and the development cost and the transplanting cost are high due to the fact that the developer does not need to use a special programming model provided by a tool to develop like other heavyweight tools, and the strong binding of business and tools is caused. The invention can be conveniently expanded on the basis of meeting the loading and distribution of some common data sources and distribution destinations through flexible SPI design. In addition, the method and the device can be independently deployed as a device to provide data loading and distribution services, and can also be integrated as a module and subsequent data processing logic in the same application system, and the practice is particularly suitable for scenes with high agility, such as rapid promotion of projects, development and debugging.
In summary, the advantages of the invention include: the method and the device for loading and distributing the universal data which is good for the small teammates are provided, and the method and the device are light in weight, non-invasive in design, flexible, extensible, independently deployable and integratable.
The embodiment of the invention also provides a storage medium, wherein processor-executable instructions are stored in the storage medium, and the processor-executable instructions are used for executing the data loading and distributing method when being executed by a processor.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution apparatus, device, or device (e.g., a computer-based apparatus, processor-containing apparatus, or other device that can fetch the instructions from the instruction execution apparatus, device, or device and execute the instructions). For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution apparatus, device, or apparatus.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The method for loading and distributing the data is characterized in that: the method comprises the following steps:
reading the configuration information of the current task from the pre-stored configuration information;
after the task is initialized through the configuration information, the state information of the task is saved;
calculating required parameters for the loading, and loading data from a data source according to the parameters; wherein the parameter is a value screening range of the sequencing field;
distributing the loaded data to a destination according to the configuration information;
recalculating and storing the state information of the task;
determining the number of the current loading records according to the state information of the task;
and determining the execution progress of data loading and distribution according to the number of the current loading records, and finishing the final data loading and distribution operation.
2. The method for data load distribution according to claim 1, wherein: the step of determining the execution progress of data loading and distribution according to the number of the current loading records and finishing the final data loading and distribution operation comprises the following steps:
when the number of the current loading records is equal to the single maximum loading number, carrying out next loading after waiting for a loading time interval;
when the number of the current loading records is less than the single maximum loading number:
if the task adopts a single-round loading mode, updating the state of the saved task to be 'finished', and ending the task; if the task adopts a continuous increment loading mode, waiting for the time interval of the turn, and then carrying out next loading; and if the task adopts the cyclic loading mode, resetting the task state and starting the next round of loading.
3. The method for data load distribution according to claim 1, wherein: the configuration information of the task comprises:
data source connection parameters for configuring parameters required for connecting to a data source;
the paging size is used for controlling the maximum record number of single loading;
the data table is used for appointing the position of the data to be loaded in a specific data set in the target data source;
the sequencing field names are used for selecting a field in the data table as the basis of the loading traversal sequence according to the characteristics and the service characteristics of the data to be loaded;
the sorting sequence is used for appointing a sorting sequence based on the sorting field for the loading traversal, and the sorting sequence comprises ascending sorting and descending sorting;
a sorting field value range minimum function used for calculating the minimum value of the sorting field value range;
the maximum function of the sorting field value range is used for calculating the maximum value of the sorting field value range;
a load time interval specifying a time interval between each load;
a round time interval for specifying a time interval between each round of loading;
a distribution mode for selecting a corresponding provider from the SPI providers to provide a distribution service;
a distribution destination for determining a destination of the distribution.
4. The method for data load distribution according to claim 3, wherein: the state information of the task includes:
a lock identifier for being a credential of a thread processing a task, each task in the same time being executable by only one thread;
a lock timeout time to indicate an expiration time of the lock;
sort field range minimum for: if the sorting sequence is ascending sorting, the minimum value of the sorting field range represents the minimum value of the sorting field value range used in next loading; if the sorting sequence is descending sorting, the minimum value of the sorting field range has no practical meaning;
sorting field range maximum value, if the sorting sequence is descending sorting, the sorting field range maximum value represents the maximum value of the sorting field value range used in next loading; if the sorting sequence is ascending sorting, the maximum value of the sorting field range has no practical meaning;
the value of the sorting field in the record with the most back sorting in the last loading result is used for representing the value of the sorting field in the record with the most back sorting in the last loading result, and the value is an end point of the next loading data screening range;
the number of records with the value of the sorting field in the last loading result equal to the value of the sorting field in the record with the most back sorting in the last loading result;
the total number of the records loaded last time is used for representing the total number of the records loaded last time;
the number of records needing to be skipped in the next screening result is used for indicating how many records needing to be skipped in the records meeting the screening condition next time so as to avoid repeatedly loading the loaded records when the values of the sequencing fields of the plurality of records are the same;
and the task running state is used for representing the running state of the task.
5. The method for data load distribution according to claim 2, wherein: the data loading and distributing comprises three modes, specifically comprising:
the single-round loading mode is used for carrying out one round of traversal according to the specified data screening range, and when no more data exists in the traversal, the task is immediately ended;
the cyclic loading mode is used for traversing according to the specified data screening range, when no more data exists in the traversal, the traversal of the current round is ended, the traversal sites are reset, a new round of traversal is performed, and the process is repeated continuously;
and the continuous increment loading mode is used for traversing according to the specified data range, and when no more data exists in the traversal mode, the data loading is continued under the condition that the traversal site is not reset.
6. The device for loading and distributing data is characterized in that: the method comprises the following steps:
the control unit is used for creating and controlling the starting and stopping of the data loading and distributing task;
the loading unit is used for reading the configuration information of the current task from the pre-stored configuration information; after the task is initialized through the configuration information, the state information of the task is saved; calculating required parameters for the loading, and loading data from a data source according to the parameters; providing the loaded data to the distribution unit according to the configuration information;
a distribution unit configured to distribute the loaded data to a destination according to the configuration information; and determining the execution progress of data loading and distribution according to the number of the current loading records, and finishing the final data loading and distribution operation.
7. The apparatus for data load distribution according to claim 6, wherein: the control unit includes:
the configuration manager is used for managing the configuration information of the task;
the configuration information storage is used for storing the configuration information of the data loading and distributing task;
the task controller is used for creating a task according to the configuration information and controlling the starting and stopping of the task;
wherein the tasks are created by the task controller for executing a specified load and dispatch task, each task controlling a load unit and a dispatch unit for handling the loading and dispatch of data.
8. The apparatus for data load distribution according to claim 6, wherein: the loading unit includes:
the loading controller is used for realizing the control logic of the data loading method, loading data from a bottom layer data source through the data loading SPI and transmitting the data to the distribution unit for distribution;
and the data loading SPI is used for providing a general data batch loading mode.
9. The apparatus for data load distribution according to claim 6, wherein: the distribution unit includes:
a distribution controller for receiving the data from the loading unit and distributing the data to a designated data consumption destination through the data distribution SPI;
and the data distribution SPI is used for providing a universal data distribution mode.
10. A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by a processor, are for performing a method of data load distribution as recited in any of claims 1-5.
CN201910963378.4A 2019-10-11 2019-10-11 Data loading and distributing method and device and storage medium Active CN110888925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910963378.4A CN110888925B (en) 2019-10-11 2019-10-11 Data loading and distributing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910963378.4A CN110888925B (en) 2019-10-11 2019-10-11 Data loading and distributing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110888925A true CN110888925A (en) 2020-03-17
CN110888925B CN110888925B (en) 2022-06-17

Family

ID=69746089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910963378.4A Active CN110888925B (en) 2019-10-11 2019-10-11 Data loading and distributing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110888925B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035524A (en) * 2020-09-02 2020-12-04 中国银行股份有限公司 List data query method and device, computer equipment and readable storage medium
CN112052136A (en) * 2020-08-18 2020-12-08 深圳市欢太科技有限公司 Data verification method and device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169477A1 (en) * 2008-12-31 2010-07-01 Sap Ag Systems and methods for dynamically provisioning cloud computing resources
US20120117571A1 (en) * 2010-11-05 2012-05-10 Adam Davis Load balancer and firewall self-provisioning system
CN103399881A (en) * 2013-07-16 2013-11-20 沈阳中科博微自动化技术有限公司 Rapid collecting and distributing method of real-time data for integrated circuit production equipment
CN103853719A (en) * 2012-11-28 2014-06-11 成都勤智数码科技股份有限公司 Extensible mass data collection system
CN104572286A (en) * 2015-01-30 2015-04-29 湖南蚁坊软件有限公司 Task scheduling method based on distributed memory clusters
CN104798038A (en) * 2012-09-25 2015-07-22 锐闻士科技有限公司 Data distribution system
CN106202123A (en) * 2015-05-07 2016-12-07 阿里巴巴集团控股有限公司 The method and apparatus that gray scale is issued
CN106407002A (en) * 2016-08-22 2017-02-15 平安科技(深圳)有限公司 Data processing task execution method and device
CN108319243A (en) * 2018-02-01 2018-07-24 江西景旺精密电路有限公司 A kind of automatic management method, storage medium and the server of PCB equipment
CN109344153A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 The processing method and terminal device of business datum

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169477A1 (en) * 2008-12-31 2010-07-01 Sap Ag Systems and methods for dynamically provisioning cloud computing resources
US20120117571A1 (en) * 2010-11-05 2012-05-10 Adam Davis Load balancer and firewall self-provisioning system
CN104798038A (en) * 2012-09-25 2015-07-22 锐闻士科技有限公司 Data distribution system
CN103853719A (en) * 2012-11-28 2014-06-11 成都勤智数码科技股份有限公司 Extensible mass data collection system
CN103399881A (en) * 2013-07-16 2013-11-20 沈阳中科博微自动化技术有限公司 Rapid collecting and distributing method of real-time data for integrated circuit production equipment
CN104572286A (en) * 2015-01-30 2015-04-29 湖南蚁坊软件有限公司 Task scheduling method based on distributed memory clusters
CN106202123A (en) * 2015-05-07 2016-12-07 阿里巴巴集团控股有限公司 The method and apparatus that gray scale is issued
CN106407002A (en) * 2016-08-22 2017-02-15 平安科技(深圳)有限公司 Data processing task execution method and device
CN108319243A (en) * 2018-02-01 2018-07-24 江西景旺精密电路有限公司 A kind of automatic management method, storage medium and the server of PCB equipment
CN109344153A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 The processing method and terminal device of business datum

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBSON EDUARDO DE GRANDE 等: "Autonomous Configuration Scheme in a Distributed Load Balancing System for HLA-Based Simulations", 《DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS》 *
丁国浩 等: "面向日志结构化数据存储的高效数据加载", 《华东师范大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052136A (en) * 2020-08-18 2020-12-08 深圳市欢太科技有限公司 Data verification method and device, equipment and storage medium
CN112035524A (en) * 2020-09-02 2020-12-04 中国银行股份有限公司 List data query method and device, computer equipment and readable storage medium
CN112035524B (en) * 2020-09-02 2024-04-19 中国银行股份有限公司 List data query method, device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
CN110888925B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
Siegel et al. Software support for heterogeneous computing
AU2007289177B2 (en) Dynamically configuring, allocating and deploying computing systems
CN110888925B (en) Data loading and distributing method and device and storage medium
US20080306904A1 (en) System, method, and program product for integrating databases
US20120005682A1 (en) Holistic task scheduling for distributed computing
US20080092140A1 (en) Systems and methods for executing a computer program in a multi-processor environment
CN101625738A (en) Method and device for generating context-aware universal workflow application
CN112835714B (en) Container arrangement method, system and medium for CPU heterogeneous clusters in cloud edge environment
US11604903B2 (en) Layered analytical modeling of telecom tower structure and scenario analysis
CN115907683A (en) Realization system and method of workflow engine based on financial product management
CN111158800A (en) Method and device for constructing task DAG based on mapping relation
CN106790489A (en) Parallel data loading method and system
US20110029986A1 (en) Supporting Administration of a Multi-Application Landscape
CN115984022B (en) Unified account checking method and device for distributed payment system
CN110941422B (en) Code automatic generation method, code generator and readable storage medium
US8151189B2 (en) Computer-implemented systems and methods for an automated application interface
CN103631594A (en) Asynchronous scheduling method and asynchronous scheduling system for general flow
CN112035439B (en) Data migration method and device, computer equipment and computer readable storage medium
CN113836121B (en) Database control method and target server
CN101639904A (en) Workflow system and method thereof for realizing tasks in flow operating period
CN103106238B (en) The user-defined operation system of a kind of support and operational approach thereof
CN105320523A (en) Data processing method and apparatus
CN105653205A (en) User request processing method and system
CN115941834B (en) Automatic operation method, device, equipment and storage medium of smart phone
CN104572228A (en) Node updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant