CN111414362B

CN111414362B - Data reading method, device, equipment and storage medium

Info

Publication number: CN111414362B
Application number: CN202010128291.8A
Authority: CN
Inventors: 帅宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-11-10
Anticipated expiration: 2040-02-28
Also published as: CN111414362A; WO2021169496A1

Abstract

The invention discloses a data reading method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount; and generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid. According to the invention, based on big data, the target data table is divided into a plurality of data blocks and then data reading is performed, so that the data processing efficiency is improved.

Description

Data reading method, device, equipment and storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a data reading method, device, apparatus, and storage medium.

Background

The Oracle database stores a large number of data tables, and because of the huge data volume, when the database needs to be modified in a large batch, the database can be locked into the whole table, and the long-time space occupation can cause the old exception of the Oracle snapshot. The existing cursor segmentation processing scheme occupies a large amount of space to cause abnormal operation, cannot be executed concurrently, occupies excessive system resources, and is easy to cause system business blocking, so that the data processing efficiency is greatly reduced. Therefore, how to improve the data processing efficiency is a technical problem to be solved currently.

Disclosure of Invention

The invention provides a data reading method, a device, equipment and a storage medium, aiming at improving the data processing efficiency.

To achieve the above object, the present invention provides a data reading method, the method comprising:

acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information;

determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount;

and generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid.

Preferably, the step of obtaining the target data table, extracting physical address information of the target data table, and generating the task table to be processed according to the physical address information further includes:

judging whether the data amount in the target data table exceeds a first threshold value;

if the data volume is larger than the first threshold value, setting the number of processes of the concurrent process according to the data volume;

dividing the target data table into a corresponding number of sub-target data tables according to the process number.

Preferably, the step of generating one or more dummy columns rowid of the data blocks and reading the target data table according to rowid further includes:

acquiring a state log, and acquiring abnormal data blocks of data through the state log;

and acquiring the rowid of the abnormal data block, marking the rowid as the abnormal rowid, and re-reading the abnormal rowid and one or more data blocks after the abnormal rowid.

Preferably, the step of obtaining the target data table, extracting physical address information of the target data table, and generating the task table to be processed according to the physical address information includes:

acquiring the target data table from a system database, and extracting physical address information of the target data table by a system, wherein the physical address information comprises the range extension and attribute information of the target data table;

and taking each extension in the physical address information as an independent task, and generating a task table to be processed according to attribute information corresponding to each independent task.

Preferably, the step of generating a dummy column rowid of one or more of the data blocks and reading the target data table according to the rowid includes:

generating rowid of each data block according to the data table number, the file number, the block number and the row number, wherein the rowid comprises a starting rowid and a stopping rowid;

and positioning the target data blocks according to the initial rowid and the final rowid, and sequentially reading one or more corresponding target data blocks until the target data table is read.

Preferably, the step of locating the target data block according to the start rowid and the end rowid further comprises:

and executing locking operation on the target data in the target data block, and releasing the lock after reading the target data.

and performing data editing operation according to the data of the rowid in the target data table.

In addition, to achieve the above object, an embodiment of the present invention further provides a data reading apparatus, including:

the acquisition module is used for acquiring a target data table, extracting physical address information of the target data table and generating a task table to be processed according to the physical address information;

the segmentation module is used for determining single data processing amount according to preset service requirements and segmenting the task table to be processed into one or more data blocks according to the single data processing amount;

and the reading module is used for generating one or more pseudo columns rowid of the data blocks and reading the target data table according to the rowid.

In addition, to achieve the above object, an embodiment of the present invention further provides a data reading apparatus including a processor, a memory, and a data reading program stored in the memory, which when executed by the processor, implements the steps of the data reading method as described above.

In addition, in order to achieve the above object, an embodiment of the present invention also provides a computer storage medium having a data reading program stored thereon, which when executed by a processor, implements the steps of the data reading method as described above

Compared with the prior art, the invention discloses a data reading method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount; and generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid. According to the invention, based on big data, the target data table is divided into a plurality of data blocks and then data reading is performed, so that the data processing efficiency is improved.

Drawings

Fig. 1 is a schematic hardware configuration of a data reading apparatus according to embodiments of the present invention;

FIG. 2 is a flowchart of a first embodiment of a data reading method according to the present invention;

FIG. 3 is a flowchart of a second embodiment of the data reading method of the present invention;

FIG. 4 is a flowchart of a third embodiment of a data reading method according to the present invention;

fig. 5 is a schematic functional block diagram of a first embodiment of the data reading apparatus of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The data reading device mainly related to the embodiment of the invention refers to a network connection device capable of realizing network connection, and the data reading device can be a server, a cloud platform and the like. In addition, the mobile terminal related to the embodiment of the invention can be mobile network equipment such as a mobile phone, a tablet personal computer and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware configuration of a data reading apparatus according to various embodiments of the present invention. In an embodiment of the present invention, the data reading device may include a processor 1001 (e.g., a central processing unit Central Processing Unit, a CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communications between these components; the input port 1003 is used for data input; the output port 1004 is used for data output, and the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 may be an optional storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration shown in fig. 1 is not limiting of the invention and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

With continued reference to fig. 1, the memory 1005 of fig. 1, which is a readable storage medium, may include an operating system, a network communication module, an application program module, and a data reading program. In fig. 1, the network communication module is mainly used for connecting with a server and performing data communication with the server; and the processor 1001 may call a data reading program stored in the memory 1005 and execute the data reading method provided by the embodiment of the present invention.

The embodiment of the invention provides a data reading method.

Oracle Database (Oracle Database), also known as Oracle RDBMS, or Oracle for short. Is a relational database management system of oracle corporation. It is a product that has been in the lead in the database field. The Oracle database system is a popular relational database management system in the world at present, has good portability, convenient use and strong functions, and is suitable for various large, medium, small and microcomputer environments. The method is a database scheme which is high in efficiency, good in reliability and suitable for high throughput.

Massive data are often stored in the Oracle database, so that when the database is subjected to mass modification, the database is locked into a whole table, and the long-time space occupation can cause old exception of the Oracle snapshot. The existing cursor segmentation processing scheme occupies a large amount of space to cause abnormal operation, cannot be executed concurrently, occupies excessive system resources, and is easy to cause system business blocking, so that the data processing efficiency is greatly reduced. Therefore, how to improve the data processing efficiency is a technical problem to be solved currently.

At present, data in an Oracle database is mainly obtained in batches through a cursor, and is submitted in sections, but consistency reading needs to be constructed by using the cursor, if the processing time needs to be several hours, a large amount of undo space can be consumed, so that running errors are extremely easy to cause, and a task is forced to terminate. And the cursor acquisition method can not be executed concurrently, and the breakpoint continuous operation needs to be read repeatedly, so that the data reading speed is low.

Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a data reading method according to the present invention.

In this embodiment, the data reading method is applied to a data reading device, and the method includes:

step S101, a target data table is obtained, physical address information of the target data table is extracted, and a task table to be processed is generated according to the physical address information;

the technical scheme of the embodiment is mainly applied to an Oracle database.

Specifically, the step of obtaining the target data table, extracting physical address information of the target data table, and generating the task table to be processed according to the physical address information includes:

step S101a: acquiring the target data table from a system database, and extracting physical address information of the target data table by a system, wherein the physical address information comprises the range extension and attribute information of the target data table; wherein the attribute information includes a data table space physical name, a path, and a size.

Step S101b: and taking each extension in the physical address information as an independent task, and generating a task table to be processed according to attribute information corresponding to each independent task. Integrating all the extensions into the data table to be processed, and writing the attribute information corresponding to the extensions into the data table to be processed.

Step S102, determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount;

and taking the preset service requirement as an adjustment requirement of the target data table. If the data volume required to be processed at one time of the preset service requirement is 30M and the hardware configuration requirement can be met, the preset service requirement can be converted into the single data processing volume. The task list to be processed can be divided into one or more data blocks according to the single data processing amount, and the number of the data blocks is recorded. For example, the single data processing amount is 30M, the data amount of the data corresponding to the task table to be processed is 30 mxn, where the value of n is greater than or equal to 1, and the data corresponding to the task table to be processed can be divided into n data blocks.

Step S103, generating one or more dummy columns rowid of the data blocks, and reading the target data table according to the rowid.

rowid is a pseudo-column used to uniquely mark a row in a table. It is the internal address of the line data in the physical table, comprising two addresses, one of which is the address of the data file stored in the block pointing to the line contained in the data table, and the other of which is the address in the data block of this line that can be located directly to the data line itself. Typically, the rowid includes a data table number, a file number, a block number, and a row number.

The step of generating one or more pseudo columns rowid of the data blocks and reading the target data table according to the rowid includes:

step S103a, generating rowid of each data block according to the data table number, the file number, the block number and the line number, wherein the rowid comprises a starting rowid and a stopping rowid;

it will be appreciated that, for a task table to be processed having a plurality of data blocks, the ending rowid of a certain data block is the starting rowid of the next data block, and similarly, the starting rowid of a certain data block is the ending rowid of the previous data block.

Step 103b, positioning to a target data block according to the initial rowid and the final rowid, and sequentially reading one or more corresponding target data blocks until the target data table is read. Specifically, if there is only one target data block, positioning may be performed according to the start rowid and the end rowid, and corresponding data may be read. And if a plurality of target data blocks exist, sequentially reading the corresponding plurality of target data blocks according to the number and/or the sequence of the target data blocks until all the data blocks are successfully read.

Further, the step of locating the target data block according to the start rowid and the end rowid further comprises:

When reading the target data block, locking is required to block other operations during reading. In general, the shorter the lock time, the less impact on the overall traffic. The smaller the data block is, the shorter the processing time is, and the lock release is performed immediately after the processing is completed, so that the corresponding locking time is also shorter.

In this embodiment, the lock may be a DML (data manipulation language ) lock. The DML lock comprises a row lock, a table lock, a column lock and the like. The row lock is also called TX lock, and is used to lock a row of data of the table. When a service performs Insert, update, delete, merge, or lock data Select For Update operations on a line of data, the system will add a line lock to the line until after the service has performed Commit or rollback operations, the line lock is not released. It will be appreciated that the row lock may be used to prevent two businesses from modifying the same row of data, when one business modifies a row of data, the database always adds an exclusive lock to the modified row so that other businesses cannot modify the row, and only after the business performs a commit or roll back operation, the database releases the corresponding lock. A row lock is a small granularity lock that provides the application with the greatest ability to modify data in parallel. When a transaction acquires a row lock, the transaction also needs to acquire a table lock for the table in which the row is located, which prevents conflicting DDL (data definition language, database definition language) operations, i.e., the database automatically adds an exclusive lock for the updated row and a child exclusive lock for the table in which the row is located.

Further, the step of generating one or more dummy columns rowid of the data blocks and reading the target data table according to the rowid further includes:

and editing operation is carried out according to the data of the rowid in the target data table.

After the rowid is positioned, data editing operation can be performed according to the positioning. The editing operations include Insert, update, delete, merge, etc. For example, by command: insert into test rowid (1, null) can insert the relevant data. For another example, the data may be deleted by creating a temporary table.

According to the scheme, the target data table is obtained, the physical address information of the target data table is extracted, and a task table to be processed is generated according to the physical address information; determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount; and generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid. According to the invention, based on big data, the target data table is divided into a plurality of data blocks and then data reading is performed, so that the data processing efficiency is improved.

As shown in fig. 3, a second embodiment of the present invention provides a data reading method, based on the first embodiment shown in fig. 1, the step of obtaining a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information further includes:

step S1001, judging whether the data amount in the target data table exceeds a first threshold;

and checking the attribute of the data corresponding to the target data table to obtain the data quantity of the data. The data amount refers to the occupied space of the data.

It will be appreciated that when the amount of data is excessive, a longer time is required for a single pass read, so a concurrency mechanism may be provided to save time and improve efficiency. The first threshold may be specifically set according to hardware devices, preset time requirements, system concurrency performance, and the like, for example, the first threshold is set to 300M.

Step S1002, if the data size is greater than the first threshold, setting a concurrency process according to the data size;

and if the data quantity exceeds the first threshold value, indicating that the concurrency mechanism needs to be activated. The concurrency mechanism may be that the larger the data volume, the more concurrent processes. For example, when the data amount is greater than the first threshold value and less than a second threshold value, setting the number of processes of the concurrent process to be a first process number; and when the data quantity is larger than or equal to the second threshold value and smaller than the third threshold value, setting the process quantity of the concurrent process to be a second process quantity, wherein the first threshold value is smaller than the second threshold value and smaller than the third threshold value, and the first process quantity is smaller than the second process quantity.

It will be appreciated that if the data amount is less than or equal to the first threshold, it may be read at a single process without setting up a concurrent process.

Step S1003, dividing the target data table into a corresponding number of sub-target data tables according to the number of processes.

Specifically, the target data table is split into a number of sub-target data tables corresponding to the number of processes. Thus, the concurrent process can read the data in the target data table at the same time.

For example, the first threshold is set to 100M, and if the data amount is 1000M, a concurrent process needs to be set. And setting the number of concurrent processes according to a concurrent mechanism, for example, dividing the target data table into 10 sub-target data tables according to the concurrent processing capacity of the system. Therefore, 10 processes read the 1000M data at the same time, so that the data reading time is greatly shortened, and the data processing efficiency is improved.

According to the embodiment, through the scheme, whether the data amount in the target data table exceeds a first threshold value is judged; if the data volume is larger than the first threshold value, setting the number of processes of the concurrent process according to the data volume; dividing the target data table into a corresponding number of sub-target data tables according to the process number. According to the method, the target data table is split based on big data and then data is read, and based on a concurrent processing mechanism, the data processing efficiency is improved.

As shown in fig. 4, a third embodiment of the present invention proposes a data reading method, based on the first embodiment shown in fig. 1, the step of generating one or more dummy columns rowid of the data block, and reading the target data table according to the rowid further includes:

step S104, a state log is obtained, and an abnormal data block is obtained through the state log;

when the system reads data, a status log is generated, and the status log records information such as a data reading object, a database, reading time, reading completion progress and the like. And after the state log is acquired, acquiring a data block which is not completely read according to the reading completion progress, and marking the data block as an abnormal data block.

It will be appreciated that the abnormal data block also includes data blocks that are not readable by the data corruption.

Step S105, acquiring the rowid of the abnormal data block and marking the rowid as an abnormal rowid, and re-reading the abnormal rowid and one or more data blocks after the abnormal rowid.

Acquiring the rowid of the abnormal data block, marking the rowid as an abnormal rowid, and reading the abnormal database by taking the abnormal starting rowid of the abnormal rowid as a starting point.

Generally, if a certain data block is not successfully read, the system will automatically skip and end the corresponding reading task, so that the database behind the abnormal data block will not be read to result in data omission. If the data is read in a vernier mode, after an abnormality occurs, the whole database needs to be scanned again, so that the breakpoint continuous operation is high in cost. In this embodiment, after the abnormal database is successfully read, the other unread databases after the abnormal data block are continuously read. If other anomalies exist in the processing process to cause the data reading to be terminated, only the data block which is currently being read and the unread database are affected, and the data which is successfully read is submitted and is not affected.

And if the abnormal data block is read for multiple times and the data in the abnormal data block cannot be completely read, outputting an alarm prompt for checking the abnormal database.

According to the embodiment, through the scheme, the state log is obtained, and the data abnormal data block is obtained through the state log; acquiring the rowid of the abnormal data block and marking the rowid as the abnormal rowid, and re-reading the abnormal rowid and one or more data blocks after the abnormal rowid, so that when the data reading is abnormal, repeated reading is not needed, and the data processing efficiency is improved.

In addition, the embodiment also provides a data reading device. Referring to fig. 5, fig. 5 is a schematic functional block diagram of a data reading apparatus according to a first embodiment of the present invention.

In this embodiment, the data reading device is a virtual device, and is stored in the memory 1005 of the data reading apparatus shown in fig. 1, so as to implement all functions of the data reading program: the method comprises the steps of obtaining a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; the method comprises the steps of determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount; and the pseudo column rowid is used for generating one or more data blocks, and the target data table is read according to the rowid.

Specifically, the data reading apparatus includes:

the acquisition module 10 is used for acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information;

the segmentation module 20 is configured to determine a single data processing amount according to a preset service requirement, and segment the task table to be processed into one or more data blocks according to the single data processing amount;

and a reading module 30, configured to generate one or more pseudo columns rowid of the data blocks, and read the target data table according to the rowid.

Further, the acquisition module is further configured to:

Further, the reading module is further configured to:

Further, the acquisition module is further configured to:

Further, the reading module is further configured to:

In addition, the embodiment of the present invention further provides a computer storage medium, where a data reading program is stored, and when the data reading program is executed by a processor, the steps of the data reading method described above are implemented, which is not described herein again.

Compared with the prior art, the data reading method, the device, the equipment and the storage medium provided by the invention comprise the following steps: acquiring a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information; determining single data processing amount according to preset service requirements, and dividing the task table to be processed into one or more data blocks according to the single data processing amount; and generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid. According to the invention, based on big data, the target data table is divided into a plurality of data blocks and then data reading is performed, so that the data processing efficiency is improved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising several instructions for causing a terminal device to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or modifications in the structures or processes described in the specification and drawings, or the direct or indirect application of the present invention to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of reading data, the method comprising:

generating one or more pseudo columns rowid of the data blocks, and reading the target data table according to the rowid;

the step of generating one or more pseudo columns rowid of the data blocks and reading the target data table according to the rowid further includes:

acquiring a state log, and acquiring a data block which is not completely read according to the reading completion progress recorded in the state log;

determining the unread complete data block as a data abnormal data block;

acquiring the rowid of the abnormal data block and marking the rowid as an abnormal rowid, and re-reading the abnormal rowid and one or more data blocks after the abnormal rowid by taking the abnormal starting rowid of the abnormal rowid as a starting point;

2. The method of claim 1, wherein the steps of obtaining a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information further comprise:

3. The method of claim 1, wherein the steps of obtaining a target data table, extracting physical address information of the target data table, and generating a task table to be processed according to the physical address information comprise:

4. The method of claim 1, wherein the step of locating the target data block according to the start rowid and the end rowid further comprises:

5. The method of claim 1, wherein the step of generating one or more dummy columns rowid of the data blocks and reading the target data table according to the rowid further comprises:

6. A data reading apparatus, characterized in that the data reading apparatus comprises:

the reading module is used for generating one or more pseudo columns rowid of the data blocks and reading the target data table according to the rowid;

the reading module is further used for obtaining a state log and obtaining a data block which is not completely read according to the reading completion progress recorded in the state log; determining the unread complete data block as a data abnormal data block; acquiring the rowid of the abnormal data block and marking the rowid as an abnormal rowid, and re-reading the abnormal rowid and one or more data blocks after the abnormal rowid by taking the abnormal starting rowid of the abnormal rowid as a starting point;

the reading module is further configured to generate rowid of each data block according to a data table number, a file number, a block number and a line number, where the rowid includes a start rowid and a stop rowid; and positioning the target data blocks according to the initial rowid and the final rowid, and sequentially reading one or more corresponding target data blocks until the target data table is read.

7. A data reading device comprising a processor, a memory and a data reading program stored in the memory, which, when executed by the processor, implements the steps of the data reading method according to any of claims 1-5.

8. A computer storage medium, characterized in that the computer storage medium has stored thereon a data reading program which, when executed by a processor, implements the steps of the data reading method according to any of claims 1-5.