CN113821556A

CN113821556A - Data loading method and device

Info

Publication number: CN113821556A
Application number: CN202111086626.5A
Authority: CN
Inventors: 许杰雄; 郑海雁; 尹飞; 李叶飞; 王松; 季聪; 陈佐; 郑飞; 郑斌; 陆嘉玮; 马吉科; 李平; 曾望志; 葛崇慧; 武梦阳; 帅率; 孙权; 王江辉; 厉文婕; 陆燕宁
Original assignee: Jiangsu Fangtian Power Technology Co Ltd
Current assignee: Jiangsu Fangtian Power Technology Co Ltd; Jiangsu Frontier Electric Power Technology Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-12-21

Abstract

The invention provides a data loading method and device. Establishing a data connection from a first database to an Open Data Processing Service (ODPS); extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set; creating a plurality of spark tasks; reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set; and loading the second table data set to a first memory database. The invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and improves the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.

Description

Data loading method and device

Technical Field

The invention relates to a data loading method and device, and belongs to the technical field of electric power system information.

Background

With the advent of the big data era, a large amount of data is generated every day in the electric power billing service, the data is large in scale and various in variety, and how to ensure the speed and timeliness of data processing becomes a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a data loading method and a data loading device so as to ensure the speed and timeliness of data processing in the electric power charging service.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, the present invention provides a data loading method, including:

establishing a data connection from a first database to an Open Data Processing Service (ODPS);

extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set;

creating a plurality of spark tasks;

reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set;

and loading the second table data set to a first memory database.

Further, the data loading method further comprises, before extracting the service data from the first database to the ODPS,

counting the total amount of the business data to be extracted in the first database by using an Ali cloud tool Di;

and determining the implementation mode of the data fragment according to the total amount of the service data.

Further, the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:

establishing a data connection from the first database to the ODPS;

determining that the data connection is successful.

Further, the creating of the plurality of spark tasks specifically includes:

when the number of spark tasks exceeds a first threshold value, setting the concurrency of the spark tasks to 1/2-1/3 of the maximum allocation cpu core number;

and when the number of spark tasks is smaller than the second threshold, setting the concurrency of the spark tasks to be lower than a third threshold.

Further, the data partition is according to a relational foreign key in the first table data set.

Further, the loading the second table data set to the first in-memory database includes: and loading the second table data set to a first in-memory database based on the relationship foreign key.

Further, the first database is an Oracle database, and the first memory database is a redis database.

In another aspect, the present invention provides a data loading apparatus, including:

a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;

a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;

the task establishing unit is configured to establish a plurality of spark tasks;

the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;

a memory loading unit configured to load the second table data set to a first memory database.

In another aspect, a computer-readable storage medium has stored thereon a computer program which, when executed in a computer, causes the computer to execute one of the aforementioned data loading methods.

In another aspect, a computing device includes a memory and a processor, where the memory stores executable codes, and the processor executes the executable codes to implement a data loading method as described above.

The invention achieves the following beneficial technical effects: the invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and can improve the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.

Drawings

FIG. 1 is a flow chart of a data loading method according to an embodiment of the present invention;

fig. 2 is a structural diagram of a data loading apparatus according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

As described above, in the big data era, a large amount of data is generated every day in the electricity billing service, and the data has a large scale and a wide variety, in order to ensure the speed and timeliness of data processing. The embodiment of the invention provides a data loading method.

Fig. 1 is a flowchart of a data loading method according to an embodiment of the present invention. As shown in fig. 1, the method comprises at least the following steps:

step 11, establishing a data connection of the first database to the open data processing service ODPS.

The Open Data Processing Service (ODPS) is a big Data platform, i.e., a Service platform, and provides big Data acquisition, cleaning, management, and analysis capabilities, and can support standardization and rapid customization of Service applications, thereby contributing to reducing the I/O throughput of Data, the possibility of unnecessary Data redundancy, and Data errors, achieving calculation result multiplexing, and improving Data use efficiency.

In different embodiments, the first database may be a different database. In one embodiment, the first database may use an Oracle database.

In one embodiment, a data connection to the ODPS may be established; and determines that the data connection was successful.

In a specific embodiment, a connection to the remote Database may be created on the user machine, and the service name of this connection is remembered, which is used in the subsequent steps to create the Database link;

returning GLOBAL _ NAME of the remote database;

checking and setting the local GLOBAL _ NAME parameter to be the same value of the remote database GLOBAL _ NAME, namely when the remote database parameter is true, the local parameter is correspondingly set to true, and if the remote database parameter is false, the local parameter is correspondingly set to false;

creating a Database link; it is tested whether the connection was successful.

In one example, if a remote library name is returned, the connection is successful.

And step 12, extracting service data from the first database to the ODPS based on the data fragmentation, and acquiring a first table data set.

Data fragmentation refers to a data storage method for storing data in a plurality of databases (hosts) in a distributed manner to achieve the effect of distributing the load of a single device. Before data fragmentation, a specific fragmentation mode can be determined according to the amount of data to be extracted. Therefore, in one embodiment, before extracting the business data, the ari cloud tool Di can be used for counting the total quantity of the business data to be extracted in the first database, and the implementation mode of the data fragment is determined according to the total quantity of the business data.

The data set is fragmented, so that distributed storage of data is brought, meanwhile, data tasks are also distributed, and each task is only responsible for processing respective fragmented data, so that the data processing performance is improved. And, multiple tasks may read and write back each sliced set, etl operation on data, etc. in a multi-threaded manner. In different embodiments, map-type operations such as data filtering, data field aliases and the like of the service table can be performed according to service requirements.

Step 13, creating a plurality of spark tasks.

Spark is a computing engine that can perform large-scale data processing, supporting multitasking data processing.

When the number of tasks is large (for example, more than thousand levels are achieved), 1/2 or 1/3 of the maximum CPU core number distributed to the queue by the concurrency degree can be set, and other jobs are prevented from being influenced; when the number of tasks is small, the execution time of each task can be evaluated, and in the expected running time, the concurrency can be reduced, the polling times can be increased, and the resource use can be reduced. For example, in one embodiment, when the number of spark tasks exceeds a first threshold, the concurrency of spark tasks may be set to 1/2 through 1/3 of the maximum number of allocated cpu cores; when the number of spark tasks is less than the second threshold, the concurrency of spark tasks may be set to be lower than a third threshold.

And step 14, reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set.

Data partitioning, which is basically a data object level process, such as partitioning of tables and indices, but is also an operation within a single database. This is in contrast to data shards, which are capable of spanning databases, even physical machines.

In one embodiment, the data partition may be dependent on a relational foreign key in the first set of table data.

In the step, the data of the association table in the ODPS can be read based on the spark task and partitioned according to the foreign key field, so that the data of the same foreign key falls into the same partition, the partition can be used for storing charging data related to the foreign key, the independence of the data is further ensured, each partition only needs to care for the data set of the partition, the data of other partitions are not needed to be relied on, and the processing logic is simplified.

And step 15, loading the second table data set to a first memory database.

In various embodiments, the first in-memory database may be a different in-memory database. In one embodiment, the first database may be a redis database.

In one embodiment, the second set of table data may be loaded to the first in-memory database based on the relational foreign key.

For example, in one particular embodiment, the relationship foreign key may be treated as a storage key in a redis database;

and writing the associated data of the external relation key into redis as the value corresponding to the key. Thus, a particular data may be written to Redis multiple times, but since Redis is a memory database, no disk access is performed, and thus the writing efficiency is very high.

It can be seen from the foregoing embodiments that, in the data loading method of the present invention, the allocation of resources, that is, the addition and allocation of more resources, plays an important role in improving the performance speed, and the data loading method can improve the data processing capability under the condition that a large amount of data is generated to cause slow processing performance.

In another embodiment, a data loading apparatus, as shown in fig. 2, includes:

In another embodiment, a computer-readable storage medium has a computer program stored thereon, which, when executed in a computer, causes the computer to perform a data loading method as described above.

In another embodiment, a computing device includes a memory having executable code stored therein and a processor that, when executing the executable code, implements a data loading method as described above.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.

Claims

1. A data loading method, comprising:

creating a plurality of spark tasks;

and loading the second table data set to a first memory database.

2. A data loading method according to claim 1, further comprising, before extracting service data from the first database to the ODPS,

3. The data loading method according to claim 1, wherein the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:

establishing a data connection from the first database to the ODPS;

determining that the data connection is successful.

4. The data loading method according to claim 1, wherein the creating of the plurality of spark tasks specifically includes:

5. A data loading method according to claim 1, wherein the data partition is dependent on a relationship foreign key in the first set of table data.

6. A data loading method according to claim 5, wherein said loading said second set of table data into said first in-memory database comprises: and loading the second table data set to a first in-memory database based on the relationship foreign key.

7. A data loading method according to claim 1, wherein the first database is an Oracle database, and the first in-memory database is a redis database.

8. A data loading apparatus, comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.

10. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-7.