CN113821556A - Data loading method and device - Google Patents

Data loading method and device Download PDF

Info

Publication number
CN113821556A
CN113821556A CN202111086626.5A CN202111086626A CN113821556A CN 113821556 A CN113821556 A CN 113821556A CN 202111086626 A CN202111086626 A CN 202111086626A CN 113821556 A CN113821556 A CN 113821556A
Authority
CN
China
Prior art keywords
data
database
data set
table data
odps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111086626.5A
Other languages
Chinese (zh)
Inventor
许杰雄
郑海雁
尹飞
李叶飞
王松
季聪
陈佐
郑飞
郑斌
陆嘉玮
马吉科
李平
曾望志
葛崇慧
武梦阳
帅率
孙权
王江辉
厉文婕
陆燕宁
仲智颖
包琰琪
刘志杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Fangtian Power Technology Co Ltd
Jiangsu Frontier Electric Power Technology Co Ltd
Original Assignee
Jiangsu Fangtian Power Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Fangtian Power Technology Co Ltd filed Critical Jiangsu Fangtian Power Technology Co Ltd
Priority to CN202111086626.5A priority Critical patent/CN113821556A/en
Publication of CN113821556A publication Critical patent/CN113821556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data loading method and device. Establishing a data connection from a first database to an Open Data Processing Service (ODPS); extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set; creating a plurality of spark tasks; reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set; and loading the second table data set to a first memory database. The invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and improves the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.

Description

Data loading method and device
Technical Field
The invention relates to a data loading method and device, and belongs to the technical field of electric power system information.
Background
With the advent of the big data era, a large amount of data is generated every day in the electric power billing service, the data is large in scale and various in variety, and how to ensure the speed and timeliness of data processing becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a data loading method and a data loading device so as to ensure the speed and timeliness of data processing in the electric power charging service.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, the present invention provides a data loading method, including:
establishing a data connection from a first database to an Open Data Processing Service (ODPS);
extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set;
creating a plurality of spark tasks;
reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set;
and loading the second table data set to a first memory database.
Further, the data loading method further comprises, before extracting the service data from the first database to the ODPS,
counting the total amount of the business data to be extracted in the first database by using an Ali cloud tool Di;
and determining the implementation mode of the data fragment according to the total amount of the service data.
Further, the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:
establishing a data connection from the first database to the ODPS;
determining that the data connection is successful.
Further, the creating of the plurality of spark tasks specifically includes:
when the number of spark tasks exceeds a first threshold value, setting the concurrency of the spark tasks to 1/2-1/3 of the maximum allocation cpu core number;
and when the number of spark tasks is smaller than the second threshold, setting the concurrency of the spark tasks to be lower than a third threshold.
Further, the data partition is according to a relational foreign key in the first table data set.
Further, the loading the second table data set to the first in-memory database includes: and loading the second table data set to a first in-memory database based on the relationship foreign key.
Further, the first database is an Oracle database, and the first memory database is a redis database.
In another aspect, the present invention provides a data loading apparatus, including:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
In another aspect, a computer-readable storage medium has stored thereon a computer program which, when executed in a computer, causes the computer to execute one of the aforementioned data loading methods.
In another aspect, a computing device includes a memory and a processor, where the memory stores executable codes, and the processor executes the executable codes to implement a data loading method as described above.
The invention achieves the following beneficial technical effects: the invention has important effect on the improvement of performance speed when allocating resources, namely increasing and allocating more resources, and can improve the data processing capacity by using a data loading method under the condition of generating a large amount of data to cause processing performance delay.
Drawings
FIG. 1 is a flow chart of a data loading method according to an embodiment of the present invention;
fig. 2 is a structural diagram of a data loading apparatus according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As described above, in the big data era, a large amount of data is generated every day in the electricity billing service, and the data has a large scale and a wide variety, in order to ensure the speed and timeliness of data processing. The embodiment of the invention provides a data loading method.
Fig. 1 is a flowchart of a data loading method according to an embodiment of the present invention. As shown in fig. 1, the method comprises at least the following steps:
step 11, establishing a data connection of the first database to the open data processing service ODPS.
The Open Data Processing Service (ODPS) is a big Data platform, i.e., a Service platform, and provides big Data acquisition, cleaning, management, and analysis capabilities, and can support standardization and rapid customization of Service applications, thereby contributing to reducing the I/O throughput of Data, the possibility of unnecessary Data redundancy, and Data errors, achieving calculation result multiplexing, and improving Data use efficiency.
In different embodiments, the first database may be a different database. In one embodiment, the first database may use an Oracle database.
In one embodiment, a data connection to the ODPS may be established; and determines that the data connection was successful.
In a specific embodiment, a connection to the remote Database may be created on the user machine, and the service name of this connection is remembered, which is used in the subsequent steps to create the Database link;
returning GLOBAL _ NAME of the remote database;
checking and setting the local GLOBAL _ NAME parameter to be the same value of the remote database GLOBAL _ NAME, namely when the remote database parameter is true, the local parameter is correspondingly set to true, and if the remote database parameter is false, the local parameter is correspondingly set to false;
creating a Database link; it is tested whether the connection was successful.
In one example, if a remote library name is returned, the connection is successful.
And step 12, extracting service data from the first database to the ODPS based on the data fragmentation, and acquiring a first table data set.
Data fragmentation refers to a data storage method for storing data in a plurality of databases (hosts) in a distributed manner to achieve the effect of distributing the load of a single device. Before data fragmentation, a specific fragmentation mode can be determined according to the amount of data to be extracted. Therefore, in one embodiment, before extracting the business data, the ari cloud tool Di can be used for counting the total quantity of the business data to be extracted in the first database, and the implementation mode of the data fragment is determined according to the total quantity of the business data.
The data set is fragmented, so that distributed storage of data is brought, meanwhile, data tasks are also distributed, and each task is only responsible for processing respective fragmented data, so that the data processing performance is improved. And, multiple tasks may read and write back each sliced set, etl operation on data, etc. in a multi-threaded manner. In different embodiments, map-type operations such as data filtering, data field aliases and the like of the service table can be performed according to service requirements.
Step 13, creating a plurality of spark tasks.
Spark is a computing engine that can perform large-scale data processing, supporting multitasking data processing.
When the number of tasks is large (for example, more than thousand levels are achieved), 1/2 or 1/3 of the maximum CPU core number distributed to the queue by the concurrency degree can be set, and other jobs are prevented from being influenced; when the number of tasks is small, the execution time of each task can be evaluated, and in the expected running time, the concurrency can be reduced, the polling times can be increased, and the resource use can be reduced. For example, in one embodiment, when the number of spark tasks exceeds a first threshold, the concurrency of spark tasks may be set to 1/2 through 1/3 of the maximum number of allocated cpu cores; when the number of spark tasks is less than the second threshold, the concurrency of spark tasks may be set to be lower than a third threshold.
And step 14, reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set.
Data partitioning, which is basically a data object level process, such as partitioning of tables and indices, but is also an operation within a single database. This is in contrast to data shards, which are capable of spanning databases, even physical machines.
In one embodiment, the data partition may be dependent on a relational foreign key in the first set of table data.
In the step, the data of the association table in the ODPS can be read based on the spark task and partitioned according to the foreign key field, so that the data of the same foreign key falls into the same partition, the partition can be used for storing charging data related to the foreign key, the independence of the data is further ensured, each partition only needs to care for the data set of the partition, the data of other partitions are not needed to be relied on, and the processing logic is simplified.
And step 15, loading the second table data set to a first memory database.
In various embodiments, the first in-memory database may be a different in-memory database. In one embodiment, the first database may be a redis database.
In one embodiment, the second set of table data may be loaded to the first in-memory database based on the relational foreign key.
For example, in one particular embodiment, the relationship foreign key may be treated as a storage key in a redis database;
and writing the associated data of the external relation key into redis as the value corresponding to the key. Thus, a particular data may be written to Redis multiple times, but since Redis is a memory database, no disk access is performed, and thus the writing efficiency is very high.
It can be seen from the foregoing embodiments that, in the data loading method of the present invention, the allocation of resources, that is, the addition and allocation of more resources, plays an important role in improving the performance speed, and the data loading method can improve the data processing capability under the condition that a large amount of data is generated to cause slow processing performance.
In another embodiment, a data loading apparatus, as shown in fig. 2, includes:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
In another embodiment, a computer-readable storage medium has a computer program stored thereon, which, when executed in a computer, causes the computer to perform a data loading method as described above.
In another embodiment, a computing device includes a memory having executable code stored therein and a processor that, when executing the executable code, implements a data loading method as described above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.

Claims (10)

1. A data loading method, comprising:
establishing a data connection from a first database to an Open Data Processing Service (ODPS);
extracting service data from a first database to the ODPS based on the data fragmentation to obtain a first table data set;
creating a plurality of spark tasks;
reading the first table data set through the plurality of spark tasks, and performing data partitioning on the first table data set to obtain a second table data set;
and loading the second table data set to a first memory database.
2. A data loading method according to claim 1, further comprising, before extracting service data from the first database to the ODPS,
counting the total amount of the business data to be extracted in the first database by using an Ali cloud tool Di;
and determining the implementation mode of the data fragment according to the total amount of the service data.
3. The data loading method according to claim 1, wherein the establishing of the data connection from the first database to the open data processing service ODPS specifically includes:
establishing a data connection from the first database to the ODPS;
determining that the data connection is successful.
4. The data loading method according to claim 1, wherein the creating of the plurality of spark tasks specifically includes:
when the number of spark tasks exceeds a first threshold value, setting the concurrency of the spark tasks to 1/2-1/3 of the maximum allocation cpu core number;
and when the number of spark tasks is smaller than the second threshold, setting the concurrency of the spark tasks to be lower than a third threshold.
5. A data loading method according to claim 1, wherein the data partition is dependent on a relationship foreign key in the first set of table data.
6. A data loading method according to claim 5, wherein said loading said second set of table data into said first in-memory database comprises: and loading the second table data set to a first in-memory database based on the relationship foreign key.
7. A data loading method according to claim 1, wherein the first database is an Oracle database, and the first in-memory database is a redis database.
8. A data loading apparatus, comprising:
a data connection establishing unit configured to establish a data connection from the first database to an open data processing service, ODPS;
a first table data set obtaining unit, configured to extract service data from a first database to the ODPS based on the data fragmentation, and obtain a first table data set;
the task establishing unit is configured to establish a plurality of spark tasks;
the data partitioning unit is configured to read the first table data set through the plurality of spark tasks, and perform data partitioning on the first table data set to obtain a second table data set;
a memory loading unit configured to load the second table data set to a first memory database.
9. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
10. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-7.
CN202111086626.5A 2021-09-16 2021-09-16 Data loading method and device Pending CN113821556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111086626.5A CN113821556A (en) 2021-09-16 2021-09-16 Data loading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111086626.5A CN113821556A (en) 2021-09-16 2021-09-16 Data loading method and device

Publications (1)

Publication Number Publication Date
CN113821556A true CN113821556A (en) 2021-12-21

Family

ID=78914742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111086626.5A Pending CN113821556A (en) 2021-09-16 2021-09-16 Data loading method and device

Country Status (1)

Country Link
CN (1) CN113821556A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120124081A1 (en) * 2010-11-17 2012-05-17 Verizon Patent And Licensing Inc. Method and system for providing data migration
US20150074037A1 (en) * 2013-09-12 2015-03-12 Sap Ag In Memory Database Warehouse
CN105989163A (en) * 2015-03-04 2016-10-05 中国移动通信集团福建有限公司 Data real-time processing method and system
US20170228422A1 (en) * 2016-02-10 2017-08-10 Futurewei Technologies, Inc. Flexible task scheduler for multiple parallel processing of database data
CN107633025A (en) * 2017-08-30 2018-01-26 苏州朗动网络科技有限公司 Big data business processing system and method
CN109325615A (en) * 2018-08-31 2019-02-12 苏宁易购集团股份有限公司 A kind of Intelligent worker assigning method and device
CN110737683A (en) * 2019-10-18 2020-01-31 成都四方伟业软件股份有限公司 Automatic partitioning method and device for extraction-based business intelligent analysis platforms
US10657154B1 (en) * 2017-08-01 2020-05-19 Amazon Technologies, Inc. Providing access to data within a migrating data partition
CN112434010A (en) * 2020-11-23 2021-03-02 国网湖南省电力有限公司 Interaction method for master station database of electricity consumption information acquisition system
US20210200645A1 (en) * 2019-12-27 2021-07-01 Rubrik, Inc. Automated discovery of databases

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120124081A1 (en) * 2010-11-17 2012-05-17 Verizon Patent And Licensing Inc. Method and system for providing data migration
US20150074037A1 (en) * 2013-09-12 2015-03-12 Sap Ag In Memory Database Warehouse
CN105989163A (en) * 2015-03-04 2016-10-05 中国移动通信集团福建有限公司 Data real-time processing method and system
US20170228422A1 (en) * 2016-02-10 2017-08-10 Futurewei Technologies, Inc. Flexible task scheduler for multiple parallel processing of database data
US10657154B1 (en) * 2017-08-01 2020-05-19 Amazon Technologies, Inc. Providing access to data within a migrating data partition
CN107633025A (en) * 2017-08-30 2018-01-26 苏州朗动网络科技有限公司 Big data business processing system and method
CN109325615A (en) * 2018-08-31 2019-02-12 苏宁易购集团股份有限公司 A kind of Intelligent worker assigning method and device
CN110737683A (en) * 2019-10-18 2020-01-31 成都四方伟业软件股份有限公司 Automatic partitioning method and device for extraction-based business intelligent analysis platforms
US20210200645A1 (en) * 2019-12-27 2021-07-01 Rubrik, Inc. Automated discovery of databases
CN112434010A (en) * 2020-11-23 2021-03-02 国网湖南省电力有限公司 Interaction method for master station database of electricity consumption information acquisition system

Similar Documents

Publication Publication Date Title
US9619430B2 (en) Active non-volatile memory post-processing
US9575984B2 (en) Similarity analysis method, apparatus, and system
US7818743B2 (en) Logging lock data
US20080313502A1 (en) Systems, methods and computer products for trace capability per work unit
CN111813805A (en) Data processing method and device
KR101656360B1 (en) Cloud System for supporting auto-scaled Hadoop Distributed Parallel Processing System
CN111125769B (en) Mass data desensitization method based on ORACLE database
CN112699098A (en) Index data migration method, device and equipment
KR101640231B1 (en) Cloud Driving Method for supporting auto-scaled Hadoop Distributed Parallel Processing System
Premchaiswadi et al. Optimizing and tuning MapReduce jobs to improve the large‐scale data analysis process
CN113821556A (en) Data loading method and device
CN111125070A (en) Data exchange method and platform
CN110851437A (en) Storage method, device and equipment
CN113626194A (en) Report file generation method, device, equipment and readable storage medium
CN108733484B (en) Method and device for managing application program
CN113760950A (en) Index data query method and device, electronic equipment and storage medium
CN113868267A (en) Method for injecting time sequence data, method for inquiring time sequence data and database system
CN112115118B (en) Database pressure measurement optimization method and device, storage medium and electronic equipment
Jian-feng et al. A High Performance Data Storage Method for Embedded Linux Real-time Database in Power Systems
CN117331511B (en) Storage device, data transmission method, device and system thereof and storage medium
US20240086386A1 (en) Multihost database host removal shortcut
CN115952005B (en) Metadata load balancing method, device, equipment and readable storage medium
Zhang et al. Ri e: optimized shu e service for large-scale data analytics
WO2023232127A1 (en) Task scheduling method, apparatus and system, and related device
CN117785057A (en) Data storage method and device, computer readable storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination