CN113821556A - Data loading method and device - Google Patents
Data loading method and device Download PDFInfo
- Publication number
- CN113821556A CN113821556A CN202111086626.5A CN202111086626A CN113821556A CN 113821556 A CN113821556 A CN 113821556A CN 202111086626 A CN202111086626 A CN 202111086626A CN 113821556 A CN113821556 A CN 113821556A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- data set
- table data
- odps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011068 loading method Methods 0.000 title claims abstract description 41
- 238000000638 solvent extraction Methods 0.000 claims abstract description 12
- 238000013467 fragmentation Methods 0.000 claims abstract description 10
- 238000006062 fragmentation reaction Methods 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 8
- 238000005192 partition Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 239000012634 fragment Substances 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 4
- 230000003203 everyday effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data loading method and device. Establishing a data connection from a first database to an Open Data Processing Service (ODPS); extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set; creating a plurality of spark tasks; reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set; and loading the second table data set to a first memory database. The invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and improves the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.
Description
Technical Field
The invention relates to a data loading method and device, and belongs to the technical field of electric power system information.
Background
With the advent of the big data era, a large amount of data is generated every day in the electric power billing service, the data is large in scale and various in variety, and how to ensure the speed and timeliness of data processing becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a data loading method and a data loading device so as to ensure the speed and timeliness of data processing in the electric power charging service.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, the present invention provides a data loading method, including:
establishing a data connection from a first database to an Open Data Processing Service (ODPS);
extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set;
creating a plurality of spark tasks;
reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set;
and loading the second table data set to a first memory database.
Further, the data loading method further comprises, before extracting the service data from the first database to the ODPS,
counting the total amount of the business data to be extracted in the first database by using an Ali cloud tool Di;
and determining the implementation mode of the data fragment according to the total amount of the service data.
Further, the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:
establishing a data connection from the first database to the ODPS;
determining that the data connection is successful.
Further, the creating of the plurality of spark tasks specifically includes:
when the number of spark tasks exceeds a first threshold value, setting the concurrency of the spark tasks to 1/2-1/3 of the maximum allocation cpu core number;
and when the number of spark tasks is smaller than the second threshold, setting the concurrency of the spark tasks to be lower than a third threshold.
Further, the data partition is according to a relational foreign key in the first table data set.
Further, the loading the second table data set to the first in-memory database includes: and loading the second table data set to a first in-memory database based on the relationship foreign key.
Further, the first database is an Oracle database, and the first memory database is a redis database.
In another aspect, the present invention provides a data loading apparatus, including:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
In another aspect, a computer-readable storage medium has stored thereon a computer program which, when executed in a computer, causes the computer to execute one of the aforementioned data loading methods.
In another aspect, a computing device includes a memory and a processor, where the memory stores executable codes, and the processor executes the executable codes to implement a data loading method as described above.
The invention achieves the following beneficial technical effects: the invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and can improve the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.
Drawings
FIG. 1 is a flow chart of a data loading method according to an embodiment of the present invention;
fig. 2 is a structural diagram of a data loading apparatus according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As described above, in the big data era, a large amount of data is generated every day in the electricity billing service, and the data has a large scale and a wide variety, in order to ensure the speed and timeliness of data processing. The embodiment of the invention provides a data loading method.
Fig. 1 is a flowchart of a data loading method according to an embodiment of the present invention. As shown in fig. 1, the method comprises at least the following steps:
step 11, establishing a data connection of the first database to the open data processing service ODPS.
The Open Data Processing Service (ODPS) is a big Data platform, i.e., a Service platform, and provides big Data acquisition, cleaning, management, and analysis capabilities, and can support standardization and rapid customization of Service applications, thereby contributing to reducing the I/O throughput of Data, the possibility of unnecessary Data redundancy, and Data errors, achieving calculation result multiplexing, and improving Data use efficiency.
In different embodiments, the first database may be a different database. In one embodiment, the first database may use an Oracle database.
In one embodiment, a data connection to the ODPS may be established; and determines that the data connection was successful.
In a specific embodiment, a connection to the remote Database may be created on the user machine, and the service name of this connection is remembered, which is used in the subsequent steps to create the Database link;
returning GLOBAL _ NAME of the remote database;
checking and setting the local GLOBAL _ NAME parameter to be the same value of the remote database GLOBAL _ NAME, namely when the remote database parameter is true, the local parameter is correspondingly set to true, and if the remote database parameter is false, the local parameter is correspondingly set to false;
creating a Database link; it is tested whether the connection was successful.
In one example, if a remote library name is returned, the connection is successful.
And step 12, extracting service data from the first database to the ODPS based on the data fragmentation, and acquiring a first table data set.
Data fragmentation refers to a data storage method for storing data in a plurality of databases (hosts) in a distributed manner to achieve the effect of distributing the load of a single device. Before data fragmentation, a specific fragmentation mode can be determined according to the amount of data to be extracted. Therefore, in one embodiment, before extracting the business data, the ari cloud tool Di can be used for counting the total quantity of the business data to be extracted in the first database, and the implementation mode of the data fragment is determined according to the total quantity of the business data.
The data set is fragmented, so that distributed storage of data is brought, meanwhile, data tasks are also distributed, and each task is only responsible for processing respective fragmented data, so that the data processing performance is improved. And, multiple tasks may read and write back each sliced set, etl operation on data, etc. in a multi-threaded manner. In different embodiments, map-type operations such as data filtering, data field aliases and the like of the service table can be performed according to service requirements.
Step 13, creating a plurality of spark tasks.
Spark is a computing engine that can perform large-scale data processing, supporting multitasking data processing.
When the number of tasks is large (for example, more than thousand levels are achieved), 1/2 or 1/3 of the maximum CPU core number distributed to the queue by the concurrency degree can be set, and other jobs are prevented from being influenced; when the number of tasks is small, the execution time of each task can be evaluated, and in the expected running time, the concurrency can be reduced, the polling times can be increased, and the resource use can be reduced. For example, in one embodiment, when the number of spark tasks exceeds a first threshold, the concurrency of spark tasks may be set to 1/2 through 1/3 of the maximum number of allocated cpu cores; when the number of spark tasks is less than the second threshold, the concurrency of spark tasks may be set to be lower than a third threshold.
And step 14, reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set.
Data partitioning, which is basically a data object level process, such as partitioning of tables and indices, but is also an operation within a single database. This is in contrast to data shards, which are capable of spanning databases, even physical machines.
In one embodiment, the data partition may be dependent on a relational foreign key in the first set of table data.
In the step, the data of the association table in the ODPS can be read based on the spark task and partitioned according to the foreign key field, so that the data of the same foreign key falls into the same partition, the partition can be used for storing charging data related to the foreign key, the independence of the data is further ensured, each partition only needs to care for the data set of the partition, the data of other partitions are not needed to be relied on, and the processing logic is simplified.
And step 15, loading the second table data set to a first memory database.
In various embodiments, the first in-memory database may be a different in-memory database. In one embodiment, the first database may be a redis database.
In one embodiment, the second set of table data may be loaded to the first in-memory database based on the relational foreign key.
For example, in one particular embodiment, the relationship foreign key may be treated as a storage key in a redis database;
and writing the associated data of the external relation key into redis as the value corresponding to the key. Thus, a particular data may be written to Redis multiple times, but since Redis is a memory database, no disk access is performed, and thus the writing efficiency is very high.
It can be seen from the foregoing embodiments that, in the data loading method of the present invention, the allocation of resources, that is, the addition and allocation of more resources, plays an important role in improving the performance speed, and the data loading method can improve the data processing capability under the condition that a large amount of data is generated to cause slow processing performance.
In another embodiment, a data loading apparatus, as shown in fig. 2, includes:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
In another embodiment, a computer-readable storage medium has a computer program stored thereon, which, when executed in a computer, causes the computer to perform a data loading method as described above.
In another embodiment, a computing device includes a memory having executable code stored therein and a processor that, when executing the executable code, implements a data loading method as described above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.
Claims (10)
1. A data loading method, comprising:
establishing a data connection from a first database to an Open Data Processing Service (ODPS);
extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set;
creating a plurality of spark tasks;
reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set;
and loading the second table data set to a first memory database.
2. A data loading method according to claim 1, further comprising, before extracting service data from the first database to the ODPS,
counting the total amount of the business data to be extracted in the first database by using an Ali cloud tool Di;
and determining the implementation mode of the data fragment according to the total amount of the service data.
3. The data loading method according to claim 1, wherein the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:
establishing a data connection from the first database to the ODPS;
determining that the data connection is successful.
4. The data loading method according to claim 1, wherein the creating of the plurality of spark tasks specifically includes:
when the number of spark tasks exceeds a first threshold value, setting the concurrency of the spark tasks to 1/2-1/3 of the maximum allocation cpu core number;
and when the number of spark tasks is smaller than the second threshold, setting the concurrency of the spark tasks to be lower than a third threshold.
5. A data loading method according to claim 1, wherein the data partition is dependent on a relationship foreign key in the first set of table data.
6. A data loading method according to claim 5, wherein said loading said second set of table data into said first in-memory database comprises: and loading the second table data set to a first in-memory database based on the relationship foreign key.
7. A data loading method according to claim 1, wherein the first database is an Oracle database, and the first in-memory database is a redis database.
8. A data loading apparatus, comprising:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
9. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
10. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111086626.5A CN113821556A (en) | 2021-09-16 | 2021-09-16 | Data loading method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111086626.5A CN113821556A (en) | 2021-09-16 | 2021-09-16 | Data loading method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113821556A true CN113821556A (en) | 2021-12-21 |
Family
ID=78914742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111086626.5A Pending CN113821556A (en) | 2021-09-16 | 2021-09-16 | Data loading method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113821556A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120124081A1 (en) * | 2010-11-17 | 2012-05-17 | Verizon Patent And Licensing Inc. | Method and system for providing data migration |
US20150074037A1 (en) * | 2013-09-12 | 2015-03-12 | Sap Ag | In Memory Database Warehouse |
CN105989163A (en) * | 2015-03-04 | 2016-10-05 | 中国移动通信集团福建有限公司 | Data real-time processing method and system |
US20170228422A1 (en) * | 2016-02-10 | 2017-08-10 | Futurewei Technologies, Inc. | Flexible task scheduler for multiple parallel processing of database data |
CN107633025A (en) * | 2017-08-30 | 2018-01-26 | 苏州朗动网络科技有限公司 | Big data business processing system and method |
CN109325615A (en) * | 2018-08-31 | 2019-02-12 | 苏宁易购集团股份有限公司 | A kind of Intelligent worker assigning method and device |
CN110737683A (en) * | 2019-10-18 | 2020-01-31 | 成都四方伟业软件股份有限公司 | Automatic partitioning method and device for extraction-based business intelligent analysis platforms |
US10657154B1 (en) * | 2017-08-01 | 2020-05-19 | Amazon Technologies, Inc. | Providing access to data within a migrating data partition |
CN112434010A (en) * | 2020-11-23 | 2021-03-02 | 国网湖南省电力有限公司 | Interaction method for master station database of electricity consumption information acquisition system |
US20210200645A1 (en) * | 2019-12-27 | 2021-07-01 | Rubrik, Inc. | Automated discovery of databases |
-
2021
- 2021-09-16 CN CN202111086626.5A patent/CN113821556A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120124081A1 (en) * | 2010-11-17 | 2012-05-17 | Verizon Patent And Licensing Inc. | Method and system for providing data migration |
US20150074037A1 (en) * | 2013-09-12 | 2015-03-12 | Sap Ag | In Memory Database Warehouse |
CN105989163A (en) * | 2015-03-04 | 2016-10-05 | 中国移动通信集团福建有限公司 | Data real-time processing method and system |
US20170228422A1 (en) * | 2016-02-10 | 2017-08-10 | Futurewei Technologies, Inc. | Flexible task scheduler for multiple parallel processing of database data |
US10657154B1 (en) * | 2017-08-01 | 2020-05-19 | Amazon Technologies, Inc. | Providing access to data within a migrating data partition |
CN107633025A (en) * | 2017-08-30 | 2018-01-26 | 苏州朗动网络科技有限公司 | Big data business processing system and method |
CN109325615A (en) * | 2018-08-31 | 2019-02-12 | 苏宁易购集团股份有限公司 | A kind of Intelligent worker assigning method and device |
CN110737683A (en) * | 2019-10-18 | 2020-01-31 | 成都四方伟业软件股份有限公司 | Automatic partitioning method and device for extraction-based business intelligent analysis platforms |
US20210200645A1 (en) * | 2019-12-27 | 2021-07-01 | Rubrik, Inc. | Automated discovery of databases |
CN112434010A (en) * | 2020-11-23 | 2021-03-02 | 国网湖南省电力有限公司 | Interaction method for master station database of electricity consumption information acquisition system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9619430B2 (en) | Active non-volatile memory post-processing | |
US9575984B2 (en) | Similarity analysis method, apparatus, and system | |
US7818743B2 (en) | Logging lock data | |
US20080313502A1 (en) | Systems, methods and computer products for trace capability per work unit | |
CN111813805A (en) | Data processing method and device | |
KR101656360B1 (en) | Cloud System for supporting auto-scaled Hadoop Distributed Parallel Processing System | |
CN111125769B (en) | Mass data desensitization method based on ORACLE database | |
CN112699098A (en) | Index data migration method, device and equipment | |
KR101640231B1 (en) | Cloud Driving Method for supporting auto-scaled Hadoop Distributed Parallel Processing System | |
Premchaiswadi et al. | Optimizing and tuning MapReduce jobs to improve the large‐scale data analysis process | |
CN113821556A (en) | Data loading method and device | |
CN111125070A (en) | Data exchange method and platform | |
CN110851437A (en) | Storage method, device and equipment | |
CN113626194A (en) | Report file generation method, device, equipment and readable storage medium | |
CN108733484B (en) | Method and device for managing application program | |
CN113760950A (en) | Index data query method and device, electronic equipment and storage medium | |
CN113868267A (en) | Method for injecting time sequence data, method for inquiring time sequence data and database system | |
CN112115118B (en) | Database pressure measurement optimization method and device, storage medium and electronic equipment | |
Jian-feng et al. | A High Performance Data Storage Method for Embedded Linux Real-time Database in Power Systems | |
CN117331511B (en) | Storage device, data transmission method, device and system thereof and storage medium | |
US20240086386A1 (en) | Multihost database host removal shortcut | |
CN115952005B (en) | Metadata load balancing method, device, equipment and readable storage medium | |
Zhang et al. | Ri e: optimized shu e service for large-scale data analytics | |
WO2023232127A1 (en) | Task scheduling method, apparatus and system, and related device | |
CN117785057A (en) | Data storage method and device, computer readable storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |