CN115469810A - Data acquisition method, device, equipment and storage medium - Google Patents

Data acquisition method, device, equipment and storage medium Download PDF

Info

Publication number
CN115469810A
CN115469810A CN202211156625.8A CN202211156625A CN115469810A CN 115469810 A CN115469810 A CN 115469810A CN 202211156625 A CN202211156625 A CN 202211156625A CN 115469810 A CN115469810 A CN 115469810A
Authority
CN
China
Prior art keywords
data
target
child node
request
data line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211156625.8A
Other languages
Chinese (zh)
Inventor
张倩
宫学庆
刘汪根
谢玉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Xinghuan Zhongzhi Information Technology Co ltd
Transwarp Technology Shanghai Co Ltd
Original Assignee
Henan Xinghuan Zhongzhi Information Technology Co ltd
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Xinghuan Zhongzhi Information Technology Co ltd, Transwarp Technology Shanghai Co Ltd filed Critical Henan Xinghuan Zhongzhi Information Technology Co ltd
Priority to CN202211156625.8A priority Critical patent/CN115469810A/en
Publication of CN115469810A publication Critical patent/CN115469810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Abstract

The invention discloses a data acquisition method, a data acquisition device, data acquisition equipment and a storage medium. The method comprises the following steps: when a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request; traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line; according to the technical scheme, the data of the target data row are read according to the physical address of the target data row, and the problems that when the data content of the data row is accessed, multiple layers of logical addresses and physical addresses are required to be read, the indirect address calculation cost in the memory access process is increased, the whole access path is very complicated, the memory access is delayed, the overall performance of the system is reduced, and the complexity of the memory access path is reduced.

Description

Data acquisition method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data acquisition method, a data acquisition device, data acquisition equipment and a storage medium.
Background
In order to enable a system to have better parallel processing capacity under the condition of diversified database loads, most database systems adopt multi-version technology at present. By maintaining multiple versions of physical data rows for each logical record of the database, read-write operations can run under respective snapshot version data on the same logical record without mutual blocking.
For how to organize and store multiple versions of physical data lines, most of existing memory database systems use a data organization form based on a heap structure, and all data lines are generally stored in the same block of storage space, including new and old versions of records. When there is a data updating request, the system requests a new data line storage space in the data table, then copies the latest version of the data line into a new data line space, and finally writes the changed content into the new data line. A new version to an old version is a method commonly used to maintain multiple versions of each record, i.e., a linked list is used to arrange the version data lines from new to old, and the required version data can be accessed by traversing the linked list.
The data organization form based on the heap structure has the following defects: when the data content of a data line is to be accessed, reading of multiple layers of logical addresses and physical addresses is required, so that the overhead of indirect address calculation in the memory access process is greatly increased, the whole access path becomes very complicated, the memory access delay is caused, and the overall performance of the system is reduced.
Disclosure of Invention
Embodiments of the present invention provide a data obtaining method, an apparatus, a device, and a storage medium, which can solve the problems that when accessing data content of a data line, multiple layers of logical addresses and physical addresses need to be read, and indirect address calculation overhead in a memory access process is increased, so that an entire access path becomes very complex, thereby causing memory access delay and overall system performance degradation.
According to an aspect of the present invention, there is provided a data acquisition method including:
when a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request;
traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line;
and reading the data of the target data line according to the physical address of the target data line.
According to another aspect of the present invention, there is provided a data acquisition apparatus comprising:
the first obtaining module is used for traversing a father node of a tree structure according to a first data identifier carried by a first request when the first request is received, and obtaining a physical address of a target child node corresponding to the first request;
the second acquisition module is used for traversing the metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain the physical address of a target data line;
and the reading module is used for reading the data of the target data line according to the physical address of the target data line.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the data acquisition method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the data acquisition method according to any one of the embodiments of the present invention when the computer instructions are executed.
When a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request; traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line; the data of the target data line is read according to the physical address of the target data line, so that the problems that when the data content of the data line is accessed, the data content needs to be read through multiple layers of logical addresses and physical addresses, the indirect address calculation cost in the memory access process is increased, the whole access path is very complicated, the memory access delay is caused, and the overall performance of the system is reduced are solved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a data acquisition method in an embodiment of the invention;
FIG. 2 is a schematic diagram of an update operation and a query operation in an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in the embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a data acquisition method provided in an embodiment of the present invention, where this embodiment is applicable to a data acquisition situation, and the method may be executed by a data acquisition apparatus in an embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, as shown in fig. 1, the method specifically includes the following steps:
s110, when a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request.
Wherein the tree structure comprises a parent node and a child node, the structure of the parent node is the same as that of the child node, and the content stored in the parent node comprises: data identification and physical addresses of child nodes; the content stored in the child node includes: node attribute information, metadata for data lines, and record data for each data line. The tree structure is constructed based on the target table, and it should be noted that after the tree structure is constructed based on the target table, the tree structure can be updated by inserting a request. Specifically, the tree structure may be constructed in the following manner: and constructing a tree structure based on the target table, inserting the data rows to be inserted into the corresponding child nodes according to the insertion request, and simultaneously updating the attribute information of the child nodes, the metadata of the data rows in the child nodes and the record data of each data row. During the process of inserting data, the updating flag bit in the metadata needs to be modified into a first value.
The first request may be a query request, a delete request, or an update request, which is not limited in this embodiment of the present invention. Specifically, if the first request is a query request, the first request carries a first data identifier and query range information, and if the first request is an update request, the first request carries the first data identifier and target data; and if the first request is a deletion request, the first request carries the first data identifier.
Optionally, the traversing a parent node of the tree structure according to the first data identifier carried in the first request to obtain a physical address of a target child node corresponding to the first request, where the traversing the parent node includes:
if the first data identification is smaller than or equal to the data identification corresponding to the parent node, inquiring the data identification corresponding to the left child node of the parent node according to the first data identification;
if the data identification which is the same as the first data identification exists in the data identification corresponding to the child node on the left side of the father node, acquiring the physical address of the child node corresponding to the data identification which is the same as the first data identification;
and determining the physical address of the child node corresponding to the data identifier which is the same as the first data identifier as the physical address of the target child node corresponding to the first request.
Wherein the parent node comprises: the data identification comprises a data identification, a data identification corresponding to a left child node of a parent node, a physical address of the left child node of the parent node, a data identification corresponding to a right child node of the parent node and a physical address of the right child node of the parent node, wherein the data identification corresponding to the left child node of the parent node is smaller than or equal to the data identification corresponding to the parent node, and the data identification corresponding to the right child node of the parent node is larger than the data identification corresponding to the parent node.
For example, if the key value of the parent node is 4, the first data identifier carried by the first request is key =7, the parent node includes that the key value of the left child node a of the parent node is 3, the key value of the left child node a of the parent node is 2, the key value of the left child node a of the parent node is 5, the key value of the left child node a of the parent node is 6, the key value of the left child node a of the parent node is 7, the key value of the left child node E of the parent node is 7, the key value of the left child node a of the parent node is u, and the target node E of the first request is determined if 7 is greater than 4.
S120, traversing the metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain the physical address of the target data line.
It should be noted that, in the multi-version database system, one logical record in the data table stores a plurality of physical data rows correspondingly.
Wherein the child node comprises: the node attribute information, the metadata of the data line, and the record data of each data line may be, for example, that the child node includes: (1) Child node metadata header (total child node space size and currently used space size); (2) A data line metadata array storing meta of each data line and a physical address pointing to the previous version, wherein meta is a 64-bit integer value comprising: 1bit identifies whether the current data line is updated or not, 1bit identifies whether the current data line is visible or not, 14bit stores the length of a key value, 16bit stores the offset of the data line, and 32bit stores a transaction id for creating the data line; and (3) the whole line data content of each data line.
The data organization form provided by the embodiment of the invention can store the latest data row data content recorded by the data table in the tree leaf nodes, thereby reducing the complexity of the memory access path. The data lines of different versions are classified in a pointer mode and are stored separately, and access competition of a centralized storage structure is reduced. And each data row corresponds to one metadata, different metadata information is maintained for different types of data rows, and version storage cost is reduced.
Specifically, the manner of traversing the metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain the physical address of the target data line may be: reading the metadata of the target child node according to the physical address of the target node, and inquiring a target data row corresponding to a data identifier which is the same as the first data identifier in the metadata of the target child node according to the first data identifier; and obtains the physical address of the target data line.
In one specific example, if the first request is a query request: all field values of the employee ID =7 are looked up. Firstly, traversing a father node of a tree structure according to an ID value of a query request, and finding a physical address of a target child node; then, traversing meta of the data line in the target child node by a binary search method; next, the physical location of the data line content is calculated according to the offset in meta, so as to read the ID in the data line content and compare it with ID =7 until the target data line with ID =7 is found, and the physical address of the target data line is obtained.
Through the orderliness of data line data content storage, all data content of a data line can be directly read at child nodes only by traversing indexes according to data identification without multilayer indirect address access. Therefore, the data organization form provided by the embodiment of the invention can save a lot of unnecessary memory indirect addressing cost, so that the data organization form has good performance advantage during retrieval.
S130, reading the data of the target data line according to the physical address of the target data line.
The data of the target data row is record data of the target data row, and for example, if the target data row ID =7, all field values of the employee ID =7 may be read.
Optionally, the first request is a query request;
after reading the data of the target data line according to the physical address of the target data line, the method further comprises the following steps:
and returning the data of the target data row as a query result.
Specifically, the data of the target data row is returned as the query result, for example, if the data row of the target data row ID =7, all field values of the employee ID =7 are read, and all field values of the employee ID =7 are returned as the query result.
Optionally, when a first request is received, before traversing a parent node of a tree structure according to a first data identifier carried in the first request to obtain a physical address of a target child node corresponding to the first request, the method further includes:
acquiring a target table;
constructing a tree structure according to the data identification in the target table, wherein the tree structure comprises: father node and son node, son node includes: node attribute information, metadata of data rows, and record data of each data row, the parent node including: the data identification and the physical address of the child node.
The space size of each child node may be a fixed value, for example, the space size of each child node may be 64KB. The child nodes are divided into three regions: 1) A leaf node metadata header; 2) A data line metadata array storing meta for each data line and a physical address pointing to the previous version, where meta is a 64-bit integer value comprising: 1bit identifies whether the current data line is updated or not, 1bit identifies whether the current data line is visible or not, 14bit stores the length of a key value, 16bit stores the offset of the data line, and 32bit stores a transaction id for creating the data line; 3) The entire row data content of each data row.
In one specific example, an employee data table employee contains attribute fields such as employee ID (length 4), employee name (length 20), salary (length 4) and address (length 32). The latest version data line of all records of the data table is stored in a tree structure constructed by employee IDs, an ID value and an address pointer are stored on a parent node of the tree structure, and the whole line of data content is stored on a child node.
Optionally, the first request is an update request, and the target table includes: attribute field length information;
correspondingly, after reading the data of the target data line according to the physical address of the target data line, the method further includes:
creating a temporary space;
storing the data of the target data row to the temporary space;
adding a write lock aiming at the data of the target data line, and modifying a pointer of a last version in the metadata of the target child node to point to a temporary space address;
updating the data of the target data line according to the length information of the attribute field and the target data carried by the updating request;
releasing a write lock on data of the target data row.
For example, if the target table is an employee data table, the target table includes attribute fields such as an employee ID, an employee name, salary, address, and the like, where the length of the employee ID is 4, the length of the employee name is 20, the length of the salary is 4, and the length of the address is 32.
Wherein, the temporary space is a temporary version buffer area. Wherein a pointer in the metadata is initially executed as null. And after the data stored in the child node is updated, modifying the pointer into a temporary space address for storing the data before updating.
The way of adding write lock to the data of the target data row may be: the update identification bit in the metadata is changed to a first value by operating with an atomic instruction.
The first value may be a preset value, and the first value is different from the second value, for example, the first value may be 1.
The manner of releasing the write lock on the data of the target data row may be: the update identification bit in the metadata is changed to a second value by operating with an atomic instruction.
The second value is a preset value, and for example, the second value may be 0.
Specifically, the method for updating the data of the target data line according to the length information of the attribute field and the target data carried in the update request may be: and determining the offset of the data to be modified according to the length information of the attribute fields, and updating the data to be modified into target data according to the offset of the data to be modified. For example, the update request may be: the salary of the employee ID =5 is changed to 5200. The target table is an employee data table employee, which contains attribute fields such as employee ID (length 4), employee name (length 20), salary (length 4) and address (length 32). And determining the offset of the data to be changed to be 24 according to the length information of the attribute field, and modifying the original salary field value to 5200.
In a specific example, if the first request is an update request, for example: the salary of the employee ID =5 is changed to 5200.
1) Firstly, as in the above query request process, the data content of data line ID =5 is found;
2) Secondly, migrating the whole recorded data content to a temporary version buffer, and tracking and maintaining metadata operated by a concurrent read transaction during the write operation so as to be used for serialization verification when a subsequent transaction is submitted;
3) Then, changing the updating identification bit in the data line meta into 1 by means of an atomic instruction operation, adding a write lock to the data line data content, and then reading the metadata identification by other concurrent write operations and returning to fail;
4) Then, according to the length of the attribute field, calculating the offset of the modified content to be 24, and modifying the salary field value to be 5200 in situ;
5) Finally, after the update transaction is successfully submitted, the update identifier in the data line meta is changed to 0 through an atomic instruction operation, so that the write lock on the data line is released, and then the data line becomes a new version and is visible for other transactions. Before this, it is also necessary to complete the migration operation on the old data line, that is, migrate the data line data content in the buffer to the retired version data block by using a memory copy method, and point the pointer of the previous version (old version) of the data line to the retired version.
Throughout the process, the update operation does not affect other concurrent transactions' access to the most recent version of the data line of the record. When there is a concurrent read request, the latest version of the cache data placed in the buffer can be directed through the last version pointer of the data line. In addition, a timing thread is used for cleaning the buffer area so as to accelerate the retrieval speed of the data items in the buffer area. Therefore, although the data organization form provided by the embodiment of the invention brings some migration cost to the system, the actual experiment result shows that the overall performance can be improved under the diversified database load scene.
Optionally, the method further includes:
when an insertion request is received, traversing a father node of a tree structure according to a second data identifier of a data row to be inserted carried by the insertion request to obtain a physical address of a first child node corresponding to the insertion request;
if the data line with the data identifier being the second data identifier does not exist in the first child node, constructing a data item, wherein the data item is used for storing metadata of the data line corresponding to the second data identifier;
modifying the updating identification bit in the metadata of the data line corresponding to the second data identification into a first numerical value;
determining the offset of the data line to be inserted according to the node attribute information of the first child node;
inserting data corresponding to the data line to be inserted into the first child node according to the offset of the data line to be inserted, and updating metadata according to the offset of the data line to be inserted;
and modifying the updating identification bit in the metadata of the data row corresponding to the second data identification into a second numerical value.
Specifically, if there is no data line with a data identifier of the second data identifier in the first child node, the manner of constructing the data item may be: traversing all data rows in the child node, judging whether the child node already has a data row with a data identifier of a second data identifier, if so, returning insertion failure, wherein the reason is data repetition; and if the child node does not have the data line with the data identifier as the second data identifier, acquiring a new data item from the metadata array of the child node.
In one specific example, when an insert request is received, for example: employee information of ID =7 is inserted.
Firstly, traversing a tree structure according to an ID value of a query request, and finding a physical address of a child node to be inserted;
then, traversing all data lines in the child node, judging whether the child node has a data line with ID =7, if so, returning insertion failure, wherein the reason is data repetition;
if the child node does not have the record with the ID =7, acquiring a new data item from the metadata array of the child node, storing the metadata of the data line with the ID =7, initializing the update identification bit in the metadata meta to be 1, and adding a write lock to the data line, wherein the data line is temporarily invisible to other transactions;
then, obtaining the total space size of the node and the currently used space size from the node header, and calculating to obtain the offset of the data line to be inserted;
then, copying the data line content of ID =7 into the offset, and updating the offset value of the data line into the metadata;
finally, when the insert transaction is successfully committed, the update flag in the metadata meta is modified to 0 by an atomic instruction operation, thereby releasing the write lock on the data line, which then becomes the latest version and is visible to other transactions.
Optionally, determining an offset of the data line to be inserted according to the node attribute information of the first child node includes:
determining the total space size and the used space size corresponding to the first child node according to the node attribute information of the first child node;
and determining the offset of the data row to be inserted according to the total space size and the used space size corresponding to the first child node.
Wherein the node attribute information of the first child node includes: the total space size and the used space size corresponding to the first child node.
Specifically, the manner of determining the offset of the data row to be inserted according to the total space size and the used space size corresponding to the first child node may be: and determining the offset of the data row to be inserted according to the difference value between the total space size corresponding to the first child node and the used space size.
Optionally, the first request is a delete request;
after reading the data of the target data line according to the physical address of the target data line, the method further comprises the following steps:
creating a temporary space;
storing the data of the target data row to the temporary space;
adding a write lock aiming at the data of the target data line, and modifying a pointer of a last version in the metadata of the target child node to point to a temporary space address;
modifying a visible flag bit in metadata of the target data line to a second numerical value;
releasing a write lock on data of the target data row.
Wherein the temporary space is a temporary version buffer.
Wherein a pointer in the metadata is initially executed as null. And after the data stored in the child node is modified, modifying the pointer into a temporary space address for storing the data before modification.
The method for adding write lock to the data of the target data row may be: the update identification bit in the metadata is changed to a first value by operating with an atomic instruction.
The first value may be a preset value, and the first value is different from the second value, for example, the first value may be 1.
The second value is a preset value, and for example, the second value may be 0.
The manner of releasing the write lock on the data of the target data row may be: the update identification bit in the metadata is changed to a second value by operating with an atomic instruction.
In a specific example, if the first request is a delete request, for example: employee information of ID =6 is deleted.
Firstly, as in the above query request process, the data content with data line ID =6 is found;
secondly, migrating the data content of the whole data line to a temporary version buffer, and tracking and maintaining metadata operated by the concurrent read transaction during the deletion operation for serialization verification and use when the subsequent transaction is submitted;
then, changing the updating identification bit in the metadata meta into 1 through an atomic instruction operation, adding a write lock to the data line data content, then reading the metadata identification through other concurrent write operations, and returning to fail; modifying a pointer of the metadata pointing to the previous version into a temporary space address, returning data content stored in a temporary space for a read operation request of the data line before the update operation is unsuccessfully submitted, and simultaneously recording transaction information of the read operation so as to provide serialization verification use in a subsequent transaction submission stage;
finally, after the update transaction is successfully submitted, modifying the metadata meta to 0 through one atomic instruction operation, thereby logically deleting the data line;
when the active version tree is subjected to merging operation, reading that the metadata meta of the data line is 0, and cleaning (namely physically deleting) the memory space occupied by the data line; when the system memory space recovery thread detects that the transaction of the data line in the version buffer has expired, the memory space occupied by the data line is cleared (i.e., physically deleted).
The embodiment of the invention uses a performance-optimized Bztree tree structure to organize and store the latest version data rows of all records of the data table. Each leaf node of the tree is a fixed-size data block, each data line is organized inside the data block according to the slot format and comprises fixed-byte-size node metadata information, and a dynamically adjusted array is used for storing the metadata of each data line and the whole record data of each data line which is placed according to the offset. Secondly, for the data line being updated, the data line is stored in a global version cache region, and the associated transaction read-write operation information is maintained. Then, for the data rows that no longer have updating requirements, the data rows are stored in a plurality of continuous memory data blocks with fixed sizes by means of sequential addition, and simple snapshot timestamp information is maintained for the data rows, so that version retrieval and use are provided. Therefore, each time the data line is updated, the latest data line stored in the leaf node is transferred to the buffer area, and the changed content is directly modified in situ on the leaf node. Although this approach may incur some memory copy overhead, it does not affect the access of other concurrent transactions to the data line.
The embodiment of the invention can classify the data rows according to the life cycle of the table version and store the data rows in a plurality of storage spaces separately, thereby reducing the contention of data access. Meanwhile, according to the diversified requirements of the database load, the applicable metadata information is maintained for the data rows of different types, and the memory space overhead is saved. In addition, the execution engine can directly read the whole data content of the data row by traversing the leaf nodes of the tree, thereby greatly simplifying the access path of the memory data and further improving the overall performance of the system for processing diversified database loads.
In one specific example, if the first request includes: query request (query ID =7 all fields of employee) and update request (update ID =5 employee salary 5200). As shown in fig. 2, the ID =4 of the parent node, and the left child node of the parent node includes: a child node of ID =4, a child node of ID =3, a child node of ID =2, and a child node of ID =1, the right child node of the parent node including: the child node with ID =5, the child node with ID =6, and the child node with ID =7, since 7 is greater than 4, the child node on the right side of the parent node is queried, the physical address of the child node with ID =7 is obtained, and all fields of the employee with ID =7 are queried. Since 5 is larger than 4, the child node on the right side of the parent node is queried, the physical address of the child node with ID =5 is obtained, and all fields of the employee with ID =5 are queried. And all fields of the employee with the ID =5 are stored in the version buffer, and the salary of the employee with the ID =5 in the child node is updated to 5200. Retirement version data blocks are stored with retirement version data, such as all fields of employees with ID =5 and all fields of employees with ID =7 stored in retirement version data blocks in fig. 2.
According to the technical scheme of the embodiment, when a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request; traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line; the data of the target data line is read according to the physical address of the target data line, so that the problems that when the data content of the data line is accessed, the data content needs to be read through multiple layers of logical addresses and physical addresses, the indirect address calculation cost in the memory access process is increased, the whole access path is very complicated, the memory access delay is caused, and the overall performance of the system is reduced are solved.
Example two
Fig. 3 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention. The present embodiment may be applicable to the case of data acquisition, where the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be integrated in any device that provides a data acquisition function, as shown in fig. 3, where the data acquisition apparatus specifically includes: a first acquisition module 310, a second acquisition module 320, and a read module 330.
The first obtaining module is used for traversing a father node of a tree structure according to a first data identifier carried by a first request when the first request is received, and obtaining a physical address of a target child node corresponding to the first request;
the second acquisition module is used for traversing the metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain the physical address of a target data line;
and the reading module is used for reading the data of the target data line according to the physical address of the target data line.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
According to the technical scheme of the embodiment, when a first request is received, traversing a parent node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request; traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line; the data of the target data row is read according to the physical address of the target data row, so that the problems that when the data content of the data row is accessed, multiple layers of logical addresses and physical addresses are required to be read, the calculation cost of indirect addresses in the memory access process is increased, the whole access path is very complicated, the memory access delay is caused, and the overall performance of the system is reduced are solved.
EXAMPLE III
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as the data acquisition method.
In some embodiments, the data acquisition method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the data acquisition method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data acquisition method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of data acquisition, comprising:
when a first request is received, traversing a father node of a tree structure according to a first data identifier carried by the first request to obtain a physical address of a target child node corresponding to the first request;
traversing metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain a physical address of a target data line;
and reading the data of the target data line according to the physical address of the target data line.
2. The method of claim 1, wherein the first request is a query request;
after reading the data of the target data line according to the physical address of the target data line, the method further comprises the following steps:
and returning the data of the target data row as a query result.
3. The method of claim 1, wherein before traversing a parent node of a tree structure according to a first data identifier carried in a first request and obtaining a physical address of a target child node corresponding to the first request when the first request is received, the method further comprises:
acquiring a target table;
constructing a tree structure according to the data identification in the target table, wherein the tree structure comprises: father node and child node, the child node includes: node attribute information, metadata of data rows, and record data of each data row, the parent node including: the data identification and the physical address of the child node.
4. The method of claim 1, wherein the first request is an update request, and wherein the target table comprises: attribute field length information;
correspondingly, after reading the data of the target data line according to the physical address of the target data line, the method further includes:
creating a temporary space;
storing the data of the target data row to the temporary space;
adding a write lock aiming at the data of the target data line, and modifying a pointer of a last version in the metadata of the target child node to point to a temporary space address;
updating the data of the target data line according to the length information of the attribute field and the target data carried by the updating request;
releasing a write lock on data of the target data row.
5. The method of claim 3, further comprising:
when an insertion request is received, traversing a father node of a tree structure according to a second data identifier of a data row to be inserted carried by the insertion request to obtain a physical address of a first child node corresponding to the insertion request;
if the data row with the data identifier as the second data identifier does not exist in the first child node, constructing a data item, wherein the data item is used for storing metadata of the data row corresponding to the second data identifier;
modifying the updating identification bit in the metadata of the data row corresponding to the second data identification into a first numerical value;
determining the offset of the data line to be inserted according to the node attribute information of the first child node;
inserting data corresponding to the data line to be inserted into the first child node according to the offset of the data line to be inserted, and updating metadata according to the offset of the data line to be inserted;
and modifying the updating identification bit in the metadata of the data line corresponding to the second data identification into a second numerical value.
6. The method of claim 5, wherein determining an offset of a data line to be inserted according to the node attribute information of the first child node comprises:
determining the total space size and the used space size corresponding to the first child node according to the node attribute information of the first child node;
and determining the offset of the data row to be inserted according to the total space size and the used space size corresponding to the first child node.
7. The method of claim 1, wherein the first request is a delete request;
after reading the data of the target data line according to the physical address of the target data line, the method further comprises the following steps:
creating a temporary space;
storing the data of the target data row to the temporary space;
adding a write lock aiming at the data of the target data line, and modifying a pointer of a last version in the metadata of the target child node into a temporary space address;
modifying a visible flag bit in metadata of the target data line to a second numerical value;
releasing a write lock on data of the target data row.
8. A data acquisition apparatus, comprising:
the first obtaining module is used for traversing a father node of a tree structure according to a first data identifier carried by a first request when the first request is received, and obtaining a physical address of a target child node corresponding to the first request;
the second acquisition module is used for traversing the metadata of the target child node according to the physical address of the target child node and the first data identifier to obtain the physical address of a target data line;
and the reading module is used for reading the data of the target data line according to the physical address of the target data line.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data acquisition method of any one of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a processor to execute the method of data acquisition of any one of claims 1-7.
CN202211156625.8A 2022-09-21 2022-09-21 Data acquisition method, device, equipment and storage medium Pending CN115469810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211156625.8A CN115469810A (en) 2022-09-21 2022-09-21 Data acquisition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211156625.8A CN115469810A (en) 2022-09-21 2022-09-21 Data acquisition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115469810A true CN115469810A (en) 2022-12-13

Family

ID=84335534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211156625.8A Pending CN115469810A (en) 2022-09-21 2022-09-21 Data acquisition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115469810A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150086A (en) * 2023-09-12 2023-12-01 北京云枢创新软件技术有限公司 Hierarchical tree-based child node generation method, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150086A (en) * 2023-09-12 2023-12-01 北京云枢创新软件技术有限公司 Hierarchical tree-based child node generation method, electronic equipment and medium
CN117150086B (en) * 2023-09-12 2024-03-22 北京云枢创新软件技术有限公司 Hierarchical tree-based child node generation method, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US11429641B2 (en) Copying data changes to a target database
US10180946B2 (en) Consistent execution of partial queries in hybrid DBMS
US10262013B2 (en) Efficient full delete operations
US10552402B2 (en) Database lockless index for accessing multi-version concurrency control data
US7418544B2 (en) Method and system for log structured relational database objects
US20180218000A1 (en) Systems, methods, and computer-readable media for a fast snapshot of application data in storage
US8429134B2 (en) Distributed database recovery
US8560500B2 (en) Method and system for removing rows from directory tables
US8924365B2 (en) System and method for range search over distributive storage systems
US20160147448A1 (en) Efficient Block-Level Space Allocation for Multi-Version Concurrency Control Data
US10572508B2 (en) Consistent query execution in hybrid DBMS
US8280917B1 (en) Batching content management operations to facilitate efficient database interactions
US20200019474A1 (en) Consistency recovery method for seamless database duplication
US9390111B2 (en) Database insert with deferred materialization
CN115408391A (en) Database table changing method, device, equipment and storage medium
CN115469810A (en) Data acquisition method, device, equipment and storage medium
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
US7752181B2 (en) System and method for performing a data uniqueness check in a sorted data set
US11003540B2 (en) Method, server, and computer readable medium for index recovery using index redo log
CN111444179B (en) Data processing method, device, storage medium and server
CN111949439B (en) Database-based data file updating method and device
CN116450751A (en) Transaction processing method and device
CN115145724A (en) Task processing method and device, electronic equipment and storage medium
CN116860700A (en) Method, device, equipment and medium for processing metadata in distributed file system
CN117539650A (en) Decentralised record lock management method of data management system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination