CN110866063A

CN110866063A - Data tracking processing method and device

Info

Publication number: CN110866063A
Application number: CN201810982558.2A
Authority: CN
Inventors: 周正中
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2020-03-06
Anticipated expiration: 2038-08-27
Also published as: CN110866063B

Abstract

The application discloses a data tracking processing method and a device, wherein the method comprises the following steps: generating tracking information according to the accessed condition of the record in a tracking processing time period, wherein the tracking information is used for recording identification information corresponding to the accessed record; selecting at least one snapshot, and determining whether the scanned record exists in the tracking information according to the identification information of the scanned record and the generated tracking information; and taking the record corresponding to the scanned identification information existing in the tracking information as the tracking data. By the data processing method, the tracking and positioning of the concerned data are realized.

Description

Data tracking processing method and device

Technical Field

The present application relates to, but not limited to, database technologies, and in particular, to a data tracking method and apparatus.

Background

In a database, data is organized in blocks. When accessing a record, the entire data block in which the record is located needs to be loaded into the memory, even if only one record in the data block is of interest to the user. In this case, amplification of input/output (IO) is caused invisibly.

When the data of interest to the user is distributed in many data blocks, the memory expansion is caused. Such as: 10000 data are concerned by the user, however, the 10000 records are distributed in 10000 different data blocks, and then 10000 data blocks are loaded into the memory. Assuming that 100 records can be dropped for 1 data block, only 100 data blocks are actually needed to drop the data of interest to the user, i.e., the 10000 records described above.

In real use, the problem of memory expansion is very obvious. When the memory is insufficient, the Least Recently Used (LRU) algorithm is usually used to eliminate the data blocks in the memory that are accessed less frequently. When the obsolete data block needs to be accessed again, the data block needs to be read again from the disk, which causes performance degradation. In order to improve the performance, the hit rate of the accessed data in the memory must be increased, and the memory must be increased to load more data blocks, which causes the problem of memory expansion.

How to avoid memory expansion while ensuring data access performance is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a data tracking processing method and device, which can realize tracking and positioning of concerned data.

In order to achieve the object of the present invention, the present invention provides a data tracking processing method, including:

generating tracking information according to the accessed condition of the record in a tracking processing time period, wherein the tracking information is used for recording identification information corresponding to the accessed record;

selecting at least one snapshot, and determining whether the scanned identification information of the record exists in the tracking information according to the scanned identification information of the record and the generated tracking information;

and taking the record corresponding to the scanned identification information existing in the tracking information as the tracking data.

Optionally, the method further comprises: moving the trace data into a specified data block.

Optionally, the generating tracking information according to the recorded accessed condition includes:

creating a tracking record for each accessed data table in the tracking processing period, wherein the tracking record is used for recording identification information corresponding to the accessed record and a table name of the accessed data table;

the created one or more trace records constitute the trace information.

creating a tracking record for each accessed data table in the tracking processing period, wherein the tracking record is used for recording identification information corresponding to records with the access times larger than a preset access threshold value and table names of the accessed data tables;

the created one or more trace records constitute the trace information.

Optionally, the tracking processing time period is a preset period which is set in advance; or, the tracking processing time interval is a preset time interval; or, the tracking processing time period is a preset time period after a preset trigger condition is met.

Optionally, the method further comprises, before:

according to the tracking information, counting the distribution condition of the data blocks stored in the record corresponding to the identification information in the tracking information;

if the statistical result shows that the number of the data blocks of the record corresponding to the identification information in the stored tracking information is larger than the preset value, continuing to execute the step of determining whether the scanned record exists in the tracking information;

and if the statistical result shows that the number of the data blocks of the record corresponding to the identification information in the storage tracking information is less than or equal to the preset numerical value, ending.

Optionally, before the selecting at least one snapshot, the method further includes:

receiving an external instruction, and executing the step of selecting at least one snapshot;

or, when a preset condition is met, executing the step of selecting at least one snapshot.

Optionally, the preset condition includes: a timed time condition, or an operation on the recording up to a preset number of lines.

Optionally, the method further comprises:

and moving and intensively storing the determined record corresponding to the scanned identification information which does not exist in the tracking information in a data block, and releasing an empty data block.

The present application further provides a computer-readable storage medium storing computer-executable instructions for performing any one of the data tracking processing methods described above.

The present application further provides an apparatus for implementing data trace processing, comprising a memory and a processor, wherein the memory has stored thereon a computer program operable on the processor: for performing the steps of any of the above described methods of implementing a data tracking process.

The present application further provides a data tracking processing apparatus, including: the device comprises a tracking recording module, a data tracking module and a data processing module; wherein the content of the first and second substances,

the tracking recording module is used for generating tracking information according to the accessed condition of the record in a tracking processing period, and the tracking information is used for recording identification information corresponding to the accessed record;

the data tracking module is used for selecting at least one snapshot and determining whether the scanned identification information of the record exists in the tracking information according to the scanned identification information of the record and the generated tracking information;

and the data processing module is used for taking the record corresponding to the scanned identification information existing in the tracking information as the tracking data.

The technical scheme of the application includes: generating tracking information according to the accessed condition of the record in a tracking processing time period, wherein the tracking information is used for recording identification information corresponding to the accessed record; selecting at least one snapshot, and determining whether the scanned record exists in the tracking information according to the identification information of the scanned record and the generated tracking information; and taking the record corresponding to the scanned identification information existing in the tracking information as the tracking data. By the data tracking processing method, the tracking and positioning of the concerned data are realized.

Further, the data tracking processing method of the present application further includes: the trace data is moved into the specified data block. That is to say, the accessed records existing in the trace information are stored in one or more data blocks in a centralized manner, so that the data concerned by the user is also stored in one or more data blocks in a centralized manner, thereby not only ensuring the data access performance and improving the data access efficiency, but also avoiding the memory expansion to the maximum extent.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.

FIG. 1 is a schematic flow chart of a data tracking processing method according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a data processing method according to the present application;

FIG. 3 is a schematic flowchart of an embodiment of a tracking information creating method according to the present application;

FIG. 4 is a schematic flow chart diagram illustrating an embodiment of a data tracking processing method according to the present application;

FIG. 5 is a schematic flow chart diagram illustrating another embodiment of a tracking information creating method according to the present application;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a tracking information creating apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

In one exemplary configuration of the present application, a computing device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

Fig. 1 is a schematic flow chart of a data tracking processing method according to the present application, as shown in fig. 1, including:

step 100: and in the tracking processing period, generating tracking information according to the accessed condition of the record, wherein the tracking information is used for recording identification information corresponding to the accessed record.

In one illustrative example, generating tracking information based on the recorded accessed condition includes:

creating a tracking record for each accessed data table in a tracking processing period, wherein the tracking record is used for recording identification information corresponding to the accessed record and a table name of the accessed data table; the created one or more trace records constitute trace information.

creating a tracking record for each accessed data table in a tracking processing period, wherein the tracking record is used for recording identification information corresponding to records with the access times larger than a preset access threshold value and table names of the accessed data tables; the created one or more trace records constitute trace information. In this embodiment, the record collected by the tracking information is a record with a high access frequency indicated by the access condition, and it is better ensured that data concerned by the user can be stored in one or more data blocks in a centralized manner, so that the data access performance is ensured, the data access efficiency is improved, and further, the memory expansion is avoided to a greater extent.

In an exemplary embodiment, the tracking information may also be preset based on access experience.

Step 101: at least one snapshot is selected, and it is determined whether the identification information of the scanned record is present in the trace information based on the identification information of the scanned record and the generated trace information.

In an exemplary embodiment, before step 101, the method further includes:

receiving an external instruction, and then executing step 101;

or, if the preset condition is satisfied, the step 101 is executed again.

Optionally, the preset conditions include: timing a time condition, or operating a record to a preset number of lines, etc.

It should be noted that after the trace information is formed for the first time, there is no strict sequence between step 100 and step 101.

Step 102: and taking the record corresponding to the scanned identification information existing in the tracking information as the tracking data.

By the data tracking processing method, tracking and positioning are successful, and the positioned tracking data can be applied to solving the problem of memory expansion and determining service scenes such as hot spot data.

Optionally, the method further includes:

the trace data is moved to the specified data block.

The accessed records existing in the tracking information are stored in one or more data blocks in a centralized manner, namely the frequently accessed records are stored in one or more data blocks in a sorted manner, so that the data concerned by a user is stored in one or more data blocks in a centralized manner, the data access performance is ensured, the data access efficiency is improved, and the memory expansion is avoided to the maximum extent.

Optionally, the data tracking processing method of the present application further includes:

for the records corresponding to the scanned identification information which do not exist in the tracking information, the data blocks where the records are located are sorted, the records are moved to one or more data blocks in a centralized manner to be stored, and the empty data blocks are released. In this way, further optimization of the database resources is realized, and the effect of avoiding memory expansion is further promoted.

The present application further provides a data tracking processing apparatus, which at least includes: the system comprises a tracking recording module, a data tracking module and a data processing module; wherein the content of the first and second substances,

Optionally, the data processing module is further configured to: moving the trace data into a specified data block.

Fig. 2 is a schematic flowchart of an embodiment of the data processing method, where the embodiment takes the application of the data tracking processing method to solve the problem of memory expansion as an example, as shown in fig. 2, the method includes:

step 200: records in the pre-selected data table are scanned to determine whether the recorded identification information is present in the pre-generated tracking information.

Wherein the tracking information includes: table names of accessed data tables, identification information of accessed records such as primary key values or row numbers.

In an exemplary embodiment, the identification information of the record is represented by an HLL value of the type HLL (hyperloglog), which enables a small amount of storage to store more unique values. The subsequent process of judging whether a certain value is in the HLL values is further accelerated, and the data processing efficiency is improved. HLL is an approximation algorithm for the radix set.

In one illustrative example, the data table includes one or more snapshots from one or more pre-selected snapshots.

Step 201: and moving the judged identification information of the scanned record existing in the tracking information to the specified data block corresponding to the record.

In one illustrative example, the specified data block includes one or more than one. The specified data block is a completely new data block.

In one illustrative example, when a specified block of data is full, the write is initiated from another specified, entirely new block of data.

By the data processing method provided by the embodiment, accessed records existing in the tracking information are stored in one or more data blocks in a centralized manner, that is, frequently accessed records are stored in one or more data blocks in a sorted manner, so that data concerned by a user is stored in one or more data blocks in a centralized manner, and not only is the data access performance ensured, but also the data access efficiency is improved, and the memory expansion is avoided to the maximum extent.

In an exemplary embodiment, the data block in which the record whose identification information is determined not to be in the tracking information is located is sorted, the record which is not moved is stored in a centralized manner, and the empty data block is released.

In an exemplary embodiment, step 200 is preceded by: triggering the data processing process shown in figure 2.

Alternatively, the trigger may be the reception of an external instruction, such as a human trigger, for example, a trigger through a preset control, or the like. The trigger may also be a timed trigger according to a preset time. The trigger may also be a trigger when a preset condition is satisfied, such as when operations such as writing, updating, and deleting records in a database reach a certain number of rows.

In an illustrative example, before triggering the data processing procedure shown in fig. 2, the method further includes:

if the distribution condition is less than the preset distribution condition threshold, that is, the statistical result shows that the number of data blocks of the record corresponding to the identification information in the storage tracking information is greater than the preset value, which indicates that the accessed data storage is dispersed, that is, the data concerned by the user is not stored in one or more data blocks in a centralized manner, then the data processing method described in fig. 2 is executed, and the records are moved to be stored in one or more data blocks in a centralized manner;

if the distribution condition is greater than the preset distribution condition threshold, that is, the statistical result shows that the number of data blocks of the record corresponding to the identification information in the storage tracking information is less than or equal to the preset value, it indicates that the accessed data stores are sufficiently centralized and do not need to be adjusted in the storage of the records, and therefore, the process of the data processing method shown in fig. 2 is not executed any more.

In one illustrative example, such as: calculating the number of the identified records in the tracking information divided by the number of the data blocks distributed by the records to obtain a first quotient value; the first quotient value is calculated divided by the average number of records stored per data block to obtain a second quotient value. When the second quotient is greater than a predetermined set value, such as 80%, the process need not be entered.

According to the method and the device, the statistics of the distribution condition of the data blocks stored in the record corresponding to the identification information in the tracking information is carried out before the data processing process, so that the unnecessary data processing process is reduced, and the system resources are saved.

The present application also provides a computer-readable storage medium storing computer-executable instructions for performing any of the data processing methods described above.

The present application further provides an apparatus for implementing data processing, comprising a memory and a processor, wherein the memory has stored thereon a computer program operable on the processor: for performing the steps of any of the above-described methods for implementing data processing.

Fig. 3 is a schematic flowchart of an embodiment of a method for establishing tracking information according to the present application, as shown in fig. 3, including:

step 300: during the trace processing period, a trace record is created for each data table accessed.

In one exemplary instance, the trace processing period may be a preset cycle set in advance for performing trace information establishment.

In one illustrative example, the tracking processing period may be a preset period of time; alternatively, the tracking processing period may be a preset time period after a preset trigger condition is satisfied.

Step 301: and storing the identification information of the accessed record in the data table and the table name of the data table in which the record is positioned in the created tracking record, wherein a plurality of tracking records form the tracking information.

In one illustrative example, generating tracking information includes:

In an illustrative example, the identification information of the record is represented using HLL values, which enables the storage of more unique values with a small amount of storage. The subsequent process of judging whether a certain value is in the HLL values is further accelerated, and the data processing efficiency is improved.

By the tracking information establishing method in the embodiment, the accessed recorded information is collected and recorded, a basis is provided for judging whether the record in the data table is concerned by the user in data processing, and the data concerned by the user can be stored in one or more data blocks in a centralized manner, so that the data access performance is ensured, the data access efficiency is improved, and the expansion of the memory is avoided to the maximum extent.

The present application further provides a computer-readable storage medium storing computer-executable instructions for performing any one of the tracking information generating methods described above.

The present application further provides an apparatus for implementing trace information establishment, comprising a memory and a processor, wherein the memory has stored thereon a computer program operable on the processor: for performing the steps of any of the above-described methods for enabling trace information establishment.

Fig. 4 is a schematic flowchart of an embodiment of the data tracking processing method, where the embodiment takes the application of the data tracking processing method in solving the problem of memory expansion as an example, as shown in fig. 4, the method includes:

step 400: a snapshot is selected.

It should be noted that, in the data processing of this embodiment, several snapshots may be selected, and then the processing of this embodiment is executed one snapshot after another until all the snapshots selected are processed completely.

Step 401: a table of data in the snapshot is selected.

One snapshot may include one or more data tables, and as long as one data table performs the processing of this embodiment after another data table, until all the data tables in the selected snapshot are processed, the process returns to step 400 to process the next snapshot.

Step 402 to step 403: and scanning the records in the data table one by one, and judging whether the records are in the tracking information of the selected snapshot.

In this embodiment, the tracking information includes: table names of accessed data tables, identification information of accessed records such as primary key values or row numbers represented by HLL values.

The method specifically comprises the following steps: it is judged on a case-by-case basis whether the identification information of each record in the data table exists in the trace information.

Step 404: the record in the trace information is moved to the specified data block.

The record whose identification information exists in the trace information is moved to a specified data block.

The designated data blocks may include one or more than one, and when a designated data block is full, writing is started from another designated new data block.

When the judgment of one data table in the current snapshot is completed, returning to step 401 to continue the processing of the next data table in the snapshot until the processing of the data table in the snapshot is completed.

Fig. 5 is a schematic flowchart of another embodiment of the tracking information establishing method of the present application, as shown in fig. 5, including:

step 500: a tracking process is started and the start time is recorded.

In this step, the tracking process may be started periodically according to a preset period, or may be started after receiving an external instruction such as a command from a user to start the tracking process.

Step 501: a trace record is created for each data table accessed.

Through the step, the identification information of each accessed record is classified according to the data table, for example, the accessed record is classified according to the table name, and the HLL value is written.

The identification information may be a primary key when a primary key is recorded, and may be a line number when no primary key is recorded.

Step 502: and closing the tracking processing, recording the end time, and forming and storing a plurality of tracking records into tracking information.

In this step, the tracking process may be ended periodically according to a preset period, or may be ended after receiving an external instruction such as a command to turn off the tracking process from a user.

In this step, the created trace records for different data tables are summarized and stored in the trace information. Thus, the tracking information includes: table names of accessed data tables, identification information of accessed records such as primary key values or row numbers.

In this embodiment, the start time of the record is used to indicate the start time of a certain trace, and the end time of the record is used to indicate the close time of a certain trace. On the one hand, through the start time and the end time of the record, the user can know the time period in which the accessed record collected this time occurs. On the other hand, if the historical trace information needs to be cleaned up, the corresponding historical trace information can be processed according to the starting time and the ending time.

Fig. 6 is a schematic diagram of a composition structure of an embodiment of the data processing apparatus according to the present application, and the embodiment takes the application of the data tracking processing method to solve the problem of memory expansion as an example, as shown in fig. 6, the data processing apparatus at least includes: the judgment module and the processing module; wherein the content of the first and second substances,

the judging module is used for scanning records in a preselected data table and judging whether the recorded identification information exists in the pre-generated tracking information or not;

and the processing module is used for moving the record of which the judged identification information exists in the tracking information to the specified data block.

By the data processing device, accessed records existing in the tracking information are stored in one or more data blocks in a centralized manner, namely frequently accessed records are stored in one or more data blocks in a sorted manner, so that data concerned by a user is stored in one or more data blocks in a centralized manner, data access performance is guaranteed, data access efficiency is improved, and memory expansion is avoided to the maximum extent.

The data processing apparatus of the present application further includes: a pre-processing module to:

if the statistical result shows that the number of the data blocks of the record corresponding to the identification information in the storage tracking information is larger than a preset value, the accessed data is dispersed in storage, that is, the data concerned by the user is not intensively stored in one or more data blocks, and the judgment module is triggered to execute the data;

and if the statistical result shows that the number of the data blocks of the record corresponding to the identification information in the storage tracking information is less than or equal to the preset value, the accessed data storage is sufficiently concentrated, the storage of the records does not need to be adjusted, and the operation is finished.

The preprocessing module is used for counting the distribution condition of the data blocks stored in the record corresponding to the identification information in the tracking information, so that unnecessary data processing processes are reduced, and system resources are saved.

Fig. 7 is a schematic structural diagram of a tracking information creating apparatus according to an embodiment of the present application, as shown in fig. 7, the tracking information creating apparatus at least includes: a tracking module and a recording module; wherein the content of the first and second substances,

the tracking module is used for creating a tracking record for each accessed data table in a tracking processing period;

and the recording module is used for storing the identification information of the accessed record in the data table and the table name of the data table where the record is located in the created tracking record, and the plurality of tracking records form the tracking information.

The tracking information establishing device is used for collecting and recording the accessed recorded information, providing a basis for judging whether the record in the data table is concerned by the user or not in data processing, and ensuring that the data concerned by the user can be stored in one or more data blocks in a centralized manner, thereby ensuring the data access performance, improving the data access efficiency and further avoiding the memory expansion to the maximum extent.

Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims

1. A data tracking processing method comprises the following steps:

2. The data trace processing method according to claim 1, further comprising: moving the trace data into a specified data block.

3. The data trace processing method as claimed in claim 1, wherein said generating trace information based on recorded accessed conditions comprises:

the created one or more trace records constitute the trace information.

4. The data trace processing method as claimed in claim 1, wherein said generating trace information based on recorded accessed conditions comprises:

the created one or more trace records constitute the trace information.

5. The data tracking processing method according to any one of claims 1 to 4, wherein the tracking processing period is a preset period which is set in advance; or, the tracking processing time interval is a preset time interval; or, the tracking processing time period is a preset time period after a preset trigger condition is met.

6. The data trace processing method according to claim 1, the method further comprising, before:

7. The data tracking process of claim 1, further comprising, prior to the selecting the at least one snapshot:

8. The data trace processing method according to claim 7, wherein the preset condition includes: a timed time condition, or an operation on the recording up to a preset number of lines.

9. The data trace processing method according to claim 1, 2, 6 or 7, further comprising:

10. A computer-readable storage medium storing computer-executable instructions for performing the data trace processing method of any one of claims 1 to 9.

11. An apparatus for implementing a data tracking process, comprising a memory and a processor, wherein the memory has stored thereon a computer program operable on the processor to: for performing the steps of the method of implementing a data tracking process of any one of claims 1 to 9.

12. A data trace processing apparatus comprising: the device comprises a tracking recording module, a data tracking module and a data processing module; wherein the content of the first and second substances,