CN111414362A

CN111414362A - Data reading method, device, equipment and storage medium

Info

Publication number: CN111414362A
Application number: CN202010128291.8A
Authority: CN
Inventors: 帅宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-07-14
Anticipated expiration: 2040-02-28
Also published as: WO2021169496A1; CN111414362B

Abstract

The invention discloses a data reading method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity; and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid. According to the invention, based on the big data, the target data table is divided into a plurality of data blocks and then data reading is carried out, so that the data processing efficiency is improved.

Description

Data reading method, device, equipment and storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a data reading method, apparatus, device, and storage medium.

Background

The Oracle database stores a large amount of data tables, and because the data volume is large, when the database needs to be modified in a large scale, the database locks the whole tables, and long-time space occupation can cause the old exception of the Oracle snapshot. The existing vernier segmentation processing scheme occupies a large amount of space to cause abnormal operation, cannot be executed concurrently, occupies excessive system resources, and is easy to cause system service blockage, so that the data processing efficiency is greatly reduced. Therefore, how to improve the data processing efficiency is a technical problem to be solved urgently at present.

Disclosure of Invention

The invention provides a data reading method, a data reading device, data reading equipment and a storage medium, and aims to improve data processing efficiency.

To achieve the above object, the present invention provides a data reading method, including:

acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information;

determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity;

and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid.

Preferably, the step of acquiring the target data table, extracting physical address information of the target data table, and generating the to-be-processed task table according to the physical address information further includes:

judging whether the data quantity in the target data table exceeds a first threshold value or not;

if the data volume is larger than the first threshold value, setting the process number of the concurrent process according to the data volume;

and dividing the target data table into sub target data tables with corresponding quantity according to the process number.

Preferably, the step of generating a pseudo-column rowid of one or more data blocks and reading the target data table according to the rowid further includes:

acquiring a state log, and acquiring a data abnormal data block through the state log;

and acquiring the rowid of the abnormal data block, marking the rowid as the abnormal rowid, and re-reading the abnormal rowid and one or more data blocks behind the abnormal rowid.

Preferably, the step of acquiring the target data table, extracting physical address information of the target data table, and generating the to-be-processed task table according to the physical address information includes:

acquiring the target data table from a system database, and extracting physical address information of the target data table by a system, wherein the physical address information comprises range extension and attribute information of the target data table;

and taking each extension in the physical address information as an independent task, and generating a task table to be processed according to attribute information corresponding to each independent task.

Preferably, the step of generating a rowid of one or more pseudo columns of the data blocks and reading the target data table according to the rowid comprises:

generating a rowid of each data block according to the data table number, the file number, the block number and the line number, wherein the rowid comprises an initial rowid and a termination rowid;

and positioning to a target data block according to the starting rowid and the stopping rowid, and sequentially reading one or more corresponding target data blocks until the target data table is read.

Preferably, the step of locating to the target data block according to the starting rowid and the terminating rowid further comprises:

and performing locking operation on the target data in the target data block, and performing locking release after reading the target data.

and performing data editing operation according to the data of the rowid in the target data table.

In addition, to achieve the above object, an embodiment of the present invention further provides a data reading apparatus, including:

the acquisition module is used for acquiring a target data table, extracting physical address information of the target data table and generating a task table to be processed according to the physical address information;

the segmentation module is used for determining single data processing capacity according to preset service requirements and segmenting the task table to be processed into one or more data blocks according to the single data processing capacity;

and the reading module is used for generating one or more pseudo columns rowid of the data blocks and reading the target data table according to the rowid.

In addition, in order to achieve the above object, an embodiment of the present invention further provides a data reading device, where the data reading device includes a processor, a memory, and a data reading program stored in the memory, and when the data reading program is executed by the processor, the data reading device implements the steps of the data reading method described above.

In addition, to achieve the above object, an embodiment of the present invention further provides a computer storage medium, where a data reading program is stored, and the data reading program, when executed by a processor, implements the steps of the data reading method as described above

Compared with the prior art, the invention discloses a data reading method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity; and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid. According to the invention, based on the big data, the target data table is divided into a plurality of data blocks and then data reading is carried out, so that the data processing efficiency is improved.

Drawings

Fig. 1 is a schematic hardware configuration diagram of a data reading apparatus according to embodiments of the present invention;

FIG. 2 is a flow chart illustrating a first embodiment of a data reading method according to the present invention;

FIG. 3 is a flow chart illustrating a second embodiment of a data reading method according to the present invention;

FIG. 4 is a flow chart illustrating a data reading method according to a third embodiment of the present invention;

fig. 5 is a functional block diagram of a data reading apparatus according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The data reading device mainly related to the embodiment of the invention is a network connection device capable of realizing network connection, and the data reading device can be a server, a cloud platform and the like. In addition, the mobile terminal related to the embodiment of the invention can be mobile network equipment such as a mobile phone, a tablet personal computer and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a data reading apparatus according to embodiments of the present invention. In this embodiment of the present invention, the data reading device may include a processor 1001 (e.g., a Central processing unit, CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the input port 1003 is used for data input; the output port 1004 is used for data output, the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

With continued reference to fig. 1, the memory 1005 of fig. 1, which is one type of readable storage medium, may include an operating system, a network communication module, an application program module, and a data reading program. In fig. 1, the network communication module is mainly used for connecting to a server and performing data communication with the server; the processor 1001 may call a data reading program stored in the memory 1005 and execute the data reading method provided by the embodiment of the present invention.

The embodiment of the invention provides a data reading method.

Oracle Database, also known as Oracle RDBMS, or simply Oracle. Is a relational database management system of the oracle culture company. It is a product that is always in the leading position in the field of databases. The Oracle database system is a popular relational database management system in the world at present, has good system portability, convenient use and strong function, and is suitable for various large, medium, small and microcomputer environments. The database scheme is high in efficiency, good in reliability and suitable for high throughput.

Massive data are often stored in the Oracle database, so that when the database needs to be modified in a large scale, a database lock full table is caused, and long-time space occupation causes an old exception of the Oracle snapshot. The existing vernier segmentation processing scheme occupies a large amount of space to cause abnormal operation, cannot be executed concurrently, occupies excessive system resources, and is easy to cause system service blockage, so that the data processing efficiency is greatly reduced. Therefore, how to improve the data processing efficiency is a technical problem to be solved urgently at present.

Currently, data in an Oracle database is mainly obtained in batches through a cursor, and processing is submitted in a segmented mode, however, consistent reading needs to be constructed by using the cursor, if processing time needs several hours, a large amount of undoo (undo) space can be consumed, and therefore running errors are easily caused, and tasks are forced to be terminated. And the cursor acquisition method can not be executed concurrently, and the breakpoint continuous operation needs to be read repeatedly, so that the data reading speed is low.

Referring to fig. 2, fig. 2 is a flowchart illustrating a data reading method according to a first embodiment of the present invention.

In this embodiment, the data reading method is applied to a data reading device, and the method includes:

step S101, a target data table is obtained, physical address information of the target data table is extracted, and a task table to be processed is generated according to the physical address information;

the technical scheme of the embodiment is mainly applied to the Oracle database.

Specifically, the step of acquiring the target data table, extracting physical address information of the target data table, and generating the to-be-processed task table according to the physical address information includes:

step S101 a: acquiring the target data table from a system database, and extracting physical address information of the target data table by a system, wherein the physical address information comprises range extension and attribute information of the target data table; wherein the attribute information includes a data table space physical name, a path, and a size.

Step S101 b: and taking each extension in the physical address information as an independent task, and generating a task table to be processed according to attribute information corresponding to each independent task. Integrating all the extensions into the data table to be processed, and writing the attribute information corresponding to the extensions into the data table to be processed.

Step S102, determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity;

the method comprises the steps of taking a preset service requirement as an actual adjustment requirement for a target data table, converting the preset service requirement into single data processing capacity if the data volume required to be processed at one time by the preset service requirement is 30M and can meet the requirement of hardware configuration, dividing a task table to be processed into one or more data blocks according to the single data processing capacity, and recording the number of the data blocks, wherein for example, the single data processing capacity is 30M, the data volume of data corresponding to the task table to be processed is 30M × n, and the value of n is greater than or equal to 1, namely, the data corresponding to the task table to be processed can be divided into n data blocks.

Step S103, generating a pseudo-column rowid of one or more data blocks, and reading the target data table according to the rowid.

rowid is a pseudo-column used to uniquely mark a row in a table. It is the internal address of the row data in the physical table, and contains two addresses, one is the address pointing to the data file stored in the block containing the row in the data table, and the other is the address of the row in the data block that can be directly positioned to the data row itself. Generally, the rowid includes a data table number, a file number, a block number, and a line number.

The step of generating a pseudo-column rowid of one or more of the data blocks and reading the target data table according to the rowid comprises:

step S103a, generating a rowid of each data block according to the data table number, the file number, the block number and the line number, wherein the rowid comprises an initial rowid and a termination rowid;

it can be understood that, for a task table to be processed with multiple data blocks, the ending rowid of a certain data block is the starting rowid of the next data block, and similarly, the starting rowid of a certain data block is the ending rowid of the previous data block.

And step S103b, positioning to a target data block according to the starting rowid and the ending rowid, and sequentially reading one or more corresponding target data blocks until the target data table is read. Specifically, if there is only one target data block, positioning may be performed according to the starting rowid and the terminating rowid, and corresponding data is read. And if the target data blocks are multiple, reading the corresponding multiple target data blocks in sequence according to the serial numbers and/or the sequence of the target data blocks until all the data blocks are successfully read.

Further, the step of locating the target data block according to the starting rowid and the terminating rowid further comprises:

When reading the target data block, locking is required to block other operations during reading. Generally, the shorter the locking time, the less impact on the overall service. The smaller the data block, the shorter the processing time, and the lock release is performed immediately after the processing is completed, and thus the corresponding locking time is also shorter.

The row lock may be used to prevent two services from modifying the same row of data, when a service modifies a row of data, the database always adds an exclusive lock to the modified row so that other services cannot modify the row, and only after the service performs a Commit or rollback Roll Back operation, the database exclusively releases the corresponding lock, the row lock is a small-granularity lock, which provides the maximum limit for the application to obtain the data in parallel, the database exclusively releases the lock when a transaction performs a Commit or rollback Roll Back operation, the database also defines a conflict lock to obtain the data in parallel, and the database also defines a conflict lock to obtain the data in parallel, i.e., a table entry 52.

Further, the step of generating a rowid of one or more dummy columns of the data blocks and reading the target data table according to the rowid further includes:

and editing operation is carried out according to the data of the rowid in the target data table.

After the rowid is used for positioning, data editing operation can be carried out according to the positioning. The editing operation comprises Insert, Update, Delete, Merge Merge and the like. For example, by commanding: insert into the website (1, null) can insert the relevant data. As another example, data may be deleted by creating a temporary table.

According to the scheme, the target data table is obtained, the physical address information of the target data table is extracted, and the task table to be processed is generated according to the physical address information; determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity; and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid. According to the invention, based on the big data, the target data table is divided into a plurality of data blocks and then data reading is carried out, so that the data processing efficiency is improved.

As shown in fig. 3, a second embodiment of the present invention provides a data reading method, based on the first embodiment shown in fig. 1, before the step of obtaining a target data table, extracting physical address information of the target data table, and generating a to-be-processed task table according to the physical address information, the method further includes:

step S1001, judging whether the data volume in the target data table exceeds a first threshold value;

and checking the attribute of the data corresponding to the target data table to obtain the data volume of the data. The data volume refers to the occupied space of the data.

It is understood that when the amount of data is too large, a single read takes a longer time, so a concurrency mechanism may be provided to save time and improve efficiency. The first threshold may be specifically set according to hardware devices, preset time requirements, system concurrency performance, and the like, for example, the first threshold is set to 300M.

Step S1002, if the data volume is larger than the first threshold, setting a concurrent process according to the data volume;

if the data volume exceeds the first threshold, it is indicated that the concurrency mechanism needs to be activated. The concurrency mechanism may be that the larger the amount of data, the more concurrent processes. For example, when the data amount is greater than the first threshold and less than a second threshold, the number of processes of the concurrent processes is set as a first process number; and when the data volume is greater than or equal to the second threshold and less than a third threshold, setting the process quantity of the concurrent processes as a second process quantity, wherein the first threshold is less than the second threshold and less than the third threshold, and the first process quantity is less than the second process quantity.

It is understood that if the data amount is less than or equal to the first threshold, the data can be read at a single process without setting a concurrent process.

And step S1003, dividing the target data table into sub target data tables with corresponding quantity according to the process number.

Specifically, the target data table is split into sub-target data tables of which the number corresponds to the number of the processes. Whereby data in the target data table may be read simultaneously by concurrent processes.

For example, the first threshold is set to 100M, and if the data amount is 1000M, a concurrent process needs to be set. And then, setting the number of concurrent processes according to a concurrent mechanism, for example, if the number of concurrent processes is 10 according to the system concurrent processing capacity, splitting the target data table into 10 sub-target data tables. Therefore, the 1000M data is read by 10 processes simultaneously, the data reading time is greatly shortened, and the data processing efficiency is improved.

According to the scheme, whether the data volume in the target data table exceeds a first threshold value is judged; if the data volume is larger than the first threshold value, setting the process number of the concurrent process according to the data volume; and dividing the target data table into sub target data tables with corresponding quantity according to the process number. The method and the device split the target data table based on the big data and then read the data, and improve the data processing efficiency based on a concurrent processing mechanism.

As shown in fig. 4, a third embodiment of the present invention provides a data reading method, based on the first embodiment shown in fig. 1, after the step of generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid, the method further includes:

step S104, acquiring a state log, and acquiring an abnormal data block through the state log;

when the system reads data, a status log is generated, and information such as a data reading object, a database, reading time, and a reading completion progress is described in the status log. And after the status log is obtained, obtaining a data block which is not completely read according to the reading completion progress, and marking the data block as an abnormal data block.

It is understood that the abnormal data block also includes data blocks that are unreadable due to data corruption.

And step S105, acquiring the rowid of the abnormal data block, marking the rowid as the abnormal rowid, and re-reading the abnormal rowid and one or more data blocks behind the abnormal rowid.

And acquiring the rowid of the abnormal data block, marking the rowid as an abnormal rowid, and reading the abnormal database by taking an abnormal starting rowid of the abnormal rowid as a starting point.

Generally, if a certain data block is not successfully read, the system automatically skips and ends the corresponding reading task, so that the database behind the abnormal data block is not read to cause data omission. If the data is read in a cursor mode, the whole database needs to be scanned again after an exception occurs, so that the breakpoint continuous operation cost is high. In this embodiment, after the abnormal database is successfully read, the other unread databases behind the abnormal data block are read continuously. Moreover, if other exceptions cause data reading termination in the processing process, only the data block currently being read and the database which is not read are affected, and the data which is successfully read is already submitted and is not affected.

And if the abnormal data block is read for multiple times and the data in the abnormal data block cannot be completely read, outputting an alarm prompt for checking the abnormal database.

According to the scheme, the state log is obtained, and the data abnormal data block is obtained through the state log; and acquiring the rowid of the abnormal data block, marking the rowid as the abnormal rowid, and re-reading the abnormal rowid and one or more data blocks behind the abnormal rowid, so that when the data reading is abnormal, repeated reading is not needed, and the data processing efficiency is improved.

In addition, the embodiment also provides a data reading device. Referring to fig. 5, fig. 5 is a functional block diagram of a data reading apparatus according to a first embodiment of the present invention.

In this embodiment, the data reading device is a virtual device, and is stored in the memory 1005 of the data reading apparatus shown in fig. 1, so as to implement all functions of the data reading program: the system comprises a target data table, a task table generation module, a task processing module and a task processing module, wherein the target data table is used for acquiring a target data table, extracting physical address information of the target data table and generating a task table to be processed according to the physical address information; the system comprises a task table, a task table and a data processing unit, wherein the task table is used for determining single data processing capacity according to preset service requirements and dividing the task table to be processed into one or more data blocks according to the single data processing capacity; and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid.

Specifically, the data reading apparatus includes:

the acquisition module 10 is configured to acquire a target data table, extract physical address information of the target data table, and generate a to-be-processed task table according to the physical address information;

the segmentation module 20 is configured to determine a single data throughput according to a preset service requirement, and segment the to-be-processed task table into one or more data blocks according to the single data throughput;

and the reading module 30 is configured to generate a pseudo-column rowid of one or more data blocks, and read the target data table according to the rowid.

Further, the obtaining module is further configured to:

Further, the reading module is further configured to:

Further, the obtaining module is further configured to:

Further, the reading module is further configured to:

In addition, an embodiment of the present invention further provides a computer storage medium, where a data reading program is stored on the computer storage medium, and when the data reading program is executed by a processor, the steps of the data reading method are implemented, which are not described herein again.

Compared with the prior art, the data reading method, the data reading device, the data reading equipment and the data reading storage medium provided by the invention comprise the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing capacity according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing capacity; and generating a pseudo column rowid of one or more data blocks, and reading the target data table according to the rowid. According to the invention, based on the big data, the target data table is divided into a plurality of data blocks and then data reading is carried out, so that the data processing efficiency is improved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a terminal device to execute the method according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or flow transformations made by the present specification and drawings, or applied directly or indirectly to other related arts, are included in the scope of the present invention.

Claims

1. A method of reading data, the method comprising:

2. The method according to claim 1, wherein the step of obtaining the target data table, extracting physical address information of the target data table, and generating the to-be-processed task table according to the physical address information further comprises:

3. The method of claim 1, wherein the step of generating a rowid of one or more of the pseudo-columns of the data blocks and reading the target data table according to the rowid is further followed by:

4. The method according to claim 1, wherein the step of obtaining a target data table, extracting physical address information of the target data table, and generating a to-be-processed task table according to the physical address information comprises:

5. The method of claim 1, wherein the step of generating a rowid of one or more of the pseudo-columns of the data blocks and reading the target data table according to the rowid comprises:

6. The method of claim 5, wherein the step of locating a target data block based on the starting rowid and the terminating rowid is further followed by:

7. The method of claim 1, wherein the step of generating a rowid of one or more of the pseudo-columns of the data blocks and reading the target data table according to the rowid is further followed by:

8. A data reading apparatus, characterized in that the data reading apparatus comprises:

9. A data reading device, characterized in that the data reading device comprises a processor, a memory and a data reading program stored in the memory, which data reading program, when executed by the processor, carries out the steps of the data reading method according to any one of claims 1 to 7.

10. A computer storage medium, having a data reading program stored thereon, the data reading program, when executed by a processor, implementing the steps of the data reading method according to any one of claims 1-7.