CN115687351A - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents

Data processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN115687351A
CN115687351A CN202211359180.3A CN202211359180A CN115687351A CN 115687351 A CN115687351 A CN 115687351A CN 202211359180 A CN202211359180 A CN 202211359180A CN 115687351 A CN115687351 A CN 115687351A
Authority
CN
China
Prior art keywords
data
tuple
index
field
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211359180.3A
Other languages
Chinese (zh)
Inventor
陈立璜
董勇明
黄坚
夏康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211359180.3A priority Critical patent/CN115687351A/en
Publication of CN115687351A publication Critical patent/CN115687351A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data processing method and device, electronic equipment and a computer readable storage medium, which can be applied to the technical field of big data and can also be applied to the technical field of finance. The data processing method comprises the following steps: reading a target index tuple from at least one index tuple in the index table based on the target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are distinguished by identifiers, and the second index field is used for positioning a head tuple from the head tuple table; reading a chain head tuple from the chain head table based on a second index field value of the target index tuple, wherein the chain head tuple comprises a chain head field; based on the head of chain field, a first target tuple of data is read from the data table.

Description

Data processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a data processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
The hot spot record in the database refers to a record in which one record is frequently updated in unit time, and in daily business processing, data operation behaviors which need to be executed frequently are to perform updating, querying, deleting and the like on the hot spot data in the database.
In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: existing database storage structures and data manipulation methods cause problems, such as: the same record increases along with the increase of updating times, the jumping times of accessing data increase along with the increase of updating times, the query is slowed down along with the increase of updating hot spot records, the process that the same record traverses multiple versions to lock and release for multiple times is caused, the original hot spot problem of one point is amplified to multiple points, and the overall performance is reduced.
Disclosure of Invention
In view of the above, the present disclosure provides a data processing method, apparatus, device, medium, and program product.
In one aspect of the present disclosure, a data processing method is provided, including:
reading a target index tuple from at least one index tuple in the index table based on the target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are distinguished by identifiers, and the second index field is used for positioning a chain head tuple from the chain head table;
reading a chain head tuple from the chain head table based on a second index field value of the target index tuple, wherein the chain head tuple comprises a chain head field, and the value of the chain head field is as follows: the storage position of the data tuple of the latest version in the data tables;
based on the chain head field, a first target data tuple is read from the data table, wherein the first target data tuple is a data tuple of a current latest version.
According to an embodiment of the present disclosure, wherein reading the target index tuple from the at least one index tuple in the index table based on the target query value comprises:
and reading an index tuple with the first index field value being the same as the target query value from at least one index tuple to serve as a target index tuple.
According to an embodiment of the present disclosure, wherein:
at least one second index field value of at least one index tuple is a storage position of the chain head tuple in the chain head table, so that the chain head tuple can be located through the second index field of any index tuple.
According to an embodiment of the present disclosure, the method further includes:
acquiring an exclusive lock of a first target data tuple under the condition that the first target data tuple needs to be updated;
newly adding a second target data tuple in the data table, wherein the second target data tuple is a data tuple updated by the first target data tuple;
and updating the value of the head of chain field to the storage position of the second target data tuple in the data table.
According to an embodiment of the present disclosure, the method further includes:
acquiring the length of a lock waiting queue of an exclusive lock of a chain head tuple;
in the event that the length of the lock wait queue is greater than a preset threshold, the first target data tuple is determined to be hot-point data.
According to an embodiment of the present disclosure, wherein:
the data tuple comprises a first data field, a plurality of first data field values of the data tuples of a plurality of versions, and main data of the data tuples of each version, wherein the main data of the plurality of versions are different and comprise the same data key.
According to an embodiment of the present disclosure, wherein:
the key values of a plurality of data of the primary key of the data under a plurality of versions are the same or different;
at least one first index field value of the at least one index tuple matches the plurality of data key values.
According to an embodiment of the present disclosure, the method further includes:
and under the condition that the key value of the data primary key of the second target data tuple is different from the key value of the data primary key of the first target data tuple, adding an index tuple in the index table, wherein the value of a first index field of the added index tuple is the key value of the data primary key of the second target data tuple.
According to an embodiment of the present disclosure, wherein:
the data tuple further includes a second data field, and a plurality of second data field values of the data tuples of the plurality of versions are: a storage location of the data tuples of the previous version of the respective version in the data table, such that the data tuples of the previous version can be linked to, respectively, by the respective second data field value.
According to an embodiment of the present disclosure, wherein:
the data tuple further includes a third data field, wherein a plurality of third data field values of the data tuples of the plurality of versions are respectively used for characterizing whether the data tuples of the respective versions are deleted.
According to an embodiment of the present disclosure, the method further includes:
under the condition that a third data field value of the second target data tuple is a first value and the second target data tuple needs to be deleted, acquiring an exclusive lock of the chain head tuple, wherein the first value is used for representing that the second target data tuple is not deleted before the deletion operation is executed;
newly adding a third target data tuple in the data table, wherein a third data field value of the third target data tuple is a second value, a first data field value of the third target data tuple is null, and the second value is used for representing that the second target data tuple is deleted after the deletion operation is executed;
and updating the value of the head of chain field to the storage position of the third target data tuple in the data table.
According to an embodiment of the present disclosure, wherein:
the data tuple further includes a fourth data field, wherein a plurality of fourth data field values of the data tuples of the plurality of versions are: the operation transaction number of the data tuple of each version.
Another aspect of the present disclosure provides a data processing apparatus including a first reading module, a second reading module, and a third reading module.
The first reading module is used for reading a target index tuple from at least one index tuple in the index table based on a target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are distinguished by identifiers, and the second index field is used for positioning a head tuple from the head list;
the second reading module is used for reading the chain head tuple from the chain head table based on a second index field value of the target index tuple, wherein the chain head tuple comprises a chain head field, and the value of the chain head field is as follows: the storage position of the data tuple of the latest version in the data tables;
and a third reading module, configured to read a first target data tuple from the data table based on the chain head field, where the first target data tuple is a data tuple of a current latest version.
According to an embodiment of the present disclosure, the first reading module includes a reading unit, configured to read, from at least one index tuple, an index tuple with a first index field value identical to the target query value as the target index tuple.
According to an embodiment of the present disclosure, wherein: at least one second index field value of at least one index tuple is a storage position of the chain head tuple in the chain head table, so that the chain head tuple can be positioned through the second index field of any index tuple.
According to the embodiment of the disclosure, the device further comprises a first obtaining module, a first adding module and a first updating module.
The first obtaining module is used for obtaining an exclusive lock of a chain head tuple under the condition that a first target data tuple needs to be updated;
the first adding module is used for newly adding a second target data tuple in the data table, wherein the second target data tuple is a data tuple updated on the first target data tuple;
and the first updating module is used for updating the value of the chain head field to the storage position of the second target data tuple in the data table.
According to the embodiment of the disclosure, the device further comprises a second obtaining module and a determining module.
The second obtaining module is configured to obtain the length of a lock waiting queue of an exclusive lock of a head-of-chain tuple;
and the determining module is used for determining the first target data tuple as the hot point data under the condition that the length of the lock waiting queue is greater than a preset threshold value.
According to an embodiment of the present disclosure, wherein: the data tuple comprises a first data field, a plurality of first data field values of the data tuples of a plurality of versions, and main data of the data tuples of each version, wherein the main data of the plurality of versions are different and comprise the same data key.
According to an embodiment of the present disclosure, wherein: a plurality of data key values of the data primary key under a plurality of versions are the same or different;
at least one first index field value of the at least one index tuple matches the plurality of data key values.
According to an embodiment of the present disclosure, the apparatus further includes a second adding module, configured to add an index tuple in the index table when a key value of a data primary key of the second target data tuple is different from a key value of a data primary key of the first target data tuple, where a value of a first index field of the added index tuple is the key value of the data primary key of the second target data tuple.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a second data field, and a plurality of second data field values of the data tuples of the plurality of versions are: a storage location of the data tuples of the previous version of the respective version in the data table, such that the data tuples of the previous version can be linked to, respectively, by the respective second data field value.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a third data field, wherein a plurality of third data field values of the data tuples of the plurality of versions are respectively used for characterizing whether the data tuples of the respective versions are deleted.
According to the embodiment of the disclosure, the device further comprises a third obtaining module, a third adding module and a second updating module.
The third obtaining module is configured to obtain an exclusive lock of a chain head tuple under a condition that a third data field value of a second target data tuple is a first value and the second target data tuple needs to be deleted, where the first value is used to indicate that the second target data tuple is not deleted before a deletion operation is performed;
a third adding module, configured to add a third target data tuple in the data table, where a third data field value of the third target data tuple is a second value, and a first data field value of the third target data tuple is null, and the second value is used to indicate that the second target data tuple is deleted after the deleting operation is performed;
and the second updating module is used for updating the value of the chain head field to the storage position of the third target data tuple in the data table.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a fourth data field, where a plurality of fourth data field values of the data tuples of the plurality of versions are: the operation transaction number of the data tuple of each version.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described data processing method.
Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to perform the above-described data processing method.
Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described data processing method.
According to the embodiment of the disclosure, the database storage structure of the embodiment of the disclosure is optimized and improved, because the second index field in the index tuple can be used to locate the head tuple from the head list, and because the value of the head field is the storage location of the latest version of the data tuple in the data list. Therefore, based on the data storage structure of the embodiment of the present disclosure, by executing the data processing method of the embodiment of the present disclosure, the latest version of the data tuple can be quickly queried. According to the method, on the basis of keeping the old version and the new version stored in the original database in a centralized manner, the problem that the query is slowed down along with the increase of the versions is solved by adjusting the data arrangement mode and the index storage mode of the old version and the new version, the problem that the query is slowed down along with the increase of the updated versions under the multi-version concurrent control mechanism of the database is solved, and the processing efficiency of hot data query is improved to a greater extent.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram illustrating a data operation performed according to a database storage structure in the related art;
FIG. 2 is a schematic diagram illustrating a trend of medium lock time over a hot-spot logging hot-spot data operation;
FIG. 3 schematically illustrates an application scenario diagram of a data processing method, apparatus, device, medium and program product according to an embodiment of the disclosure;
FIG. 4 schematically shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a database storage structure according to an embodiment of the present disclosure;
FIG. 6 schematically shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 8 schematically illustrates a schematic diagram of a database storage structure performing data operations according to an embodiment of the present disclosure;
fig. 9 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and
fig. 10 schematically shows a block diagram of an electronic device adapted to implement a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that these descriptions are illustrative only and are not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
At present, a multi-version concurrency control mechanism is realized in a database, and two new and old version data storage methods exist. One is that the new version and the old version are stored separately, and the data of the new version and the old version are stored in different areas. When the data is updated, the original position is updated to be new data, the old data is moved to a centralized area, and when the data is rolled back, the old data needs to be updated to the original position again. Oracle and Mysql are realized by adopting the method. Secondly, the new and old versions are stored in a centralized way: the new and old version data are stored in the same area. When writing new data, old data is not deleted, new data is additionally inserted, and the transaction is determined to be submitted or rolled back through the state of the transaction number. In some databases, such as PostgreSQL, multi-version data is stored centrally.
Fig. 1 schematically shows a schematic diagram of performing data manipulation according to a database storage structure in the related art.
As shown in fig. 1, in the related art, for the centralized storage mode, the database storage structure includes two parts, namely, a data tuple and a data index, and the data tuple structure storing a row of data includes two parts, namely, a tuple header and a tuple content, where the tuple header mainly includes:
xmin: the transaction number inserted into the tuple may be understood to be the effective transaction number of the tuple.
xmax: the transaction number of the tuple is deleted, which may be understood as the stale transaction number of the tuple.
And (c) ctid: the current position of the data tuple, the storage page information and the index information of the tuple pointer in the page, if the tuple is updated, the ctid is updated to the position of the new version.
The data index adopts a tree structure, and each leaf node stores an index value, namely a key value (an index key value shown in the figure) of specific field data in the data table. The ctid in the index stores the storage location of the data tuple to which it points, the page information, and the index information of the tuple pointer in the page.
As shown in fig. 1, when a data tuple is updated, it is necessary to first obtain an exclusive lock of the tuple to be updated, and then determine whether the tuple is updated by other transactions, if so, release the exclusive lock and repeatedly obtain an exclusive lock of a new version until the latest tuple that is not updated by other transactions; then, xmax of the last old tuple is updated to the current transaction number, a new tuple is inserted, and the old tuple ctid is updated to the position of the new version. Thus, the data pointing manner is: from the old version to the latest version.
For indexes, there are two processing methods when the data tuple is updated:
HOT (Heap Only Tuple) scene: index values before and after updating are not changed, and new and old tuples before and after updating are in the same page, so that a new index tuple does not need to be inserted.
non-HOT scenes: if the index values before and after updating are changed or the index values before and after updating are not changed but the new tuple and the old tuple are not in the same page, the index will insert a new index tuple and the index ctid points to the newly inserted data tuple.
The hot spot record in the database refers to a record in which a record is frequently operated in a unit time, such as updating, deleting and the like, and in daily business processing, performing updating, querying, deleting and the like on the hot spot data in the database is a data operation behavior which needs to be frequently performed.
The hot spot record heat may characterize the likelihood that the record is locked down by an update operation per unit time. The calculation method for updating the equal-lock time caused by the hotspot record can refer to the following calculation formula: the lock time of the hotspot record update = (update hotspot record hot degree/(100% -update hotspot record hot degree)) × the lock holding time after the hotspot record update lock.
FIG. 2 schematically illustrates a trend graph of medium lock time as a function of hot spot record heat during a hot spot record data operation.
For example: the average lock holding time of a certain transaction after taking an update lock of a certain hotspot record is 1 millisecond, the hotspot record update heat is x, the hotspot record update equal lock time is y, and the relationship between the equal lock time and the heat is shown in fig. 2. As can be seen from fig. 2, when the heat rate is low, the equal lock time is not significant, and when the heat rate exceeds 80%, the equal lock time increases sharply.
In the process of implementing the disclosed concept, based on the data storage structure in the related art and the above theoretical basis, the inventors found that at least the following problems exist in the related art:
for example, existing database storage structures and data manipulation methods cause problems, such as: because the data pointing mode is that the old version points to the latest version, the same record increases along with the increase of the updating times, the jumping times of the access data increase along with the increase of the updating times, the query is slowed, the updating hot point records increase along with the concurrency, the process that the same record traverses multiple versions to lock and release for multiple times is caused, the original hot point problem of one point is amplified to multiple points, and the overall performance is reduced. At present, no problem solution for updating hotspot records of databases such as postgreSQL and the like and deteriorating processing efficiency is available.
Moreover, the discovery of the hot spot record mostly depends on the production transaction performance monitoring at present, and the following two monitoring methods exist at present, but certain disadvantages exist.
One way is that the transaction response time slows down as a hot spot identification criterion. However, according to the calculation formula of the lock time such as updating of the hotspot record, as can be seen from fig. 2, the influence of the lock time such as transaction for updating the hotspot record on the response time is limited before 80%, and the lock time is easily submerged in a large number of common transactions; when the locking time is increased by more than 80%, the SQL response time is 2-3 milliseconds, which may be increased by seconds or even ten seconds, which may result in the transaction response time for updating the hotspot records to be lengthened, once the hotspot records to be processed are accumulated, a large amount of precious resources such as database connection number, memory and the like are consumed, which may finally result in the exhaustion of database connection number and memory, and the external service cannot be accepted, and this kind of production transaction performance monitoring belongs to post-production monitoring and cannot play a positive role.
The other method is to monitor operation information such as change details and the like, and take records with two relatively close time points as a hotspot identification standard, but the scheme depends on change detail registration of each step of operation of the records, so that the application and modification cost is high, and usually, only table-level hotspot discovery can be realized; and it is impossible to accurately distinguish whether two items of detail close in time point occur due to normal serial batch processing or a real hot spot scene due to parallel initiated processing, which is not favorable for accurately positioning hot spot data.
In summary, compared with databases (such as Oracle and Mysql) implemented in a separate storage manner, for updating the hotspot records, in addition to the problem that the lock waiting time increases as the update heat increases, problems that the query time becomes slow as the update old versions increase, and the hotspot record lock processing flow becomes long due to traversing of multiple versions and increasing lock are additionally introduced, the transaction response time of the update record hotspot becomes slower. And a method for discovering and solving the hot spot problem in advance is lacked, the problem can be discovered only after the production is in trouble and the business influence is caused, the problem can not be discovered in advance when the hot spot record updating heat degree is low, and the condition that the individual record updating heat degree in the table is high can be covered due to the existence of a large amount of normal serial batch processing records.
In view of this, an embodiment of the present disclosure provides a data processing method to at least partially solve the above technical problem, the method including:
reading a target index tuple from at least one index tuple in the index table based on the target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are distinguished by identifiers, and the second index field is used for positioning a head tuple from the head tuple table;
reading a chain head tuple from the chain head table based on a second index field value of the target index tuple, wherein the chain head tuple comprises a chain head field, and the value of the chain head field is as follows: the storage position of the data tuple of the latest version in the data tables;
based on the chain head field, a first target data tuple is read from the data table, wherein the first target data tuple is a data tuple of a current latest version.
Fig. 3 schematically illustrates an application scenario diagram of a data processing method, apparatus, device, medium, and program product according to embodiments of the present disclosure.
As shown in fig. 3, the application scenario 300 according to this embodiment may include a terminal device 301, a server 302, and a database 303. The terminal devices 301, the server 302, and the database 303 are in communication with each other via a network, which may include various types of connections, such as wired, wireless communication links, or fiber optic cables.
The terminal device 301 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 302 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by the user using the terminal device 301. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
Database 303 may be any type of database including, but not limited to, various relational and non-relational databases, and the like. The database 303 may store various types of business data, such as various transaction data, consumption data, product sales data, customer data, and the like, according to a preset data structure.
A user may use terminal device 301 to interact with server 302 over a network to receive or send messages and the like. A variety of messaging client applications may be installed on the terminal device 301, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).
During the business process executed by the server 302, relevant data operations such as query, update, deletion and the like can be executed by accessing the database 303. For example, data query operations may be performed according to the methods described in embodiments of the present disclosure: firstly, reading a target index tuple from at least one index tuple in the index table based on the query key, then reading a chain head tuple from the chain head table based on an index field of the target index tuple, and finally reading a data tuple of the current latest version from the data table based on the chain head tuple.
It should be noted that the data processing method provided by the embodiment of the present disclosure may be generally executed by the server 302. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 302. The data processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 302 and is capable of communicating with the terminal device 301 and/or the server 302. Correspondingly, the data processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 302 and capable of communicating with the terminal device 301 and/or the server 302.
It should be understood that the number of terminal devices, servers, databases in fig. 1 is merely illustrative. There may be any number of terminal devices, servers, databases, as desired for implementation.
It should be noted that the data processing method and apparatus of the present disclosure may be applied to the field of big data technology, the field of financial technology, or any other field than the field of big data technology and the field of financial technology.
In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of public sequences is not violated.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
The data processing method of the disclosed embodiment will be described in detail below with reference to fig. 4 to 10 based on the scenario described in fig. 3.
FIG. 4 schematically shows a flow diagram of a data processing method according to an embodiment of the present disclosure; FIG. 5 schematically shows a database storage structure according to an embodiment of the disclosure. The method of the embodiment of the present disclosure is described below with reference to fig. 4 and 5.
As shown in fig. 4, the data processing method of this embodiment includes operations S401 to S403.
In operation S401, based on the target query value, reading a target index tuple from at least one index tuple in the index table, where the index tuple includes a first index field and a second index field, at least one first index field value of the at least one index tuple is different from each other, so that the identifiers distinguish different index tuples, and the second index field is used to locate the head tuple from the head list;
in operation S402, a head-of-chain tuple is read from the head-of-chain table based on a second index field value of the target index tuple, where the head-of-chain tuple includes a head-of-chain field, and a value of the head-of-chain field is: the storage position of the data tuple of the latest version in the data table in a plurality of versions of data tuples in the data table;
in operation S403, a first target data tuple is read from the data table based on the head of chain field, where the first target data tuple is a current latest version of data tuple.
According to the embodiment of the present disclosure, the data processing method may be applied to a database adopting a centralized storage manner, for example, a PostgreSQL database.
For the centralized storage mode, the embodiment of the present disclosure optimizes and improves the storage structure of the database in the related art, and the storage structure adopted is as shown in fig. 5. Compared with a data storage structure (as shown in fig. 1) in the related art, the data storage structure in the embodiment of the disclosure adds a new data link top page storage space, which includes three parts, namely an index page, a link top page, and a data page, and adjusts the storage structures of the index page and the data page.
Specifically, information pointing to the position of the data tuple of the latest version in the chain structure of the new version and the old version of the same record is extracted separately and stored in an independent page space for management, the space is a chain head page space, and the tuple in the chain head page space is called a chain head tuple. The first-chain tuple only stores information pointing to the latest version data tuple, and the page information can be placed in the data cache region to reduce data access overhead without causing excessive memory space occupation pressure.
In the chain head tuple, the value of a chain head field (ctid) is the storage position of the latest version of the data tuple in the data table, and the page information of the latest data tuple and the index information of the tuple pointer in the page are stored. As illustrated in FIG. 5, the head field (ctid) of the head tuple is (2, 1) indicating that the storage location of the current latest version of the data tuple in the data table is the 2 nd group of the 2 nd page.
The data page is used for storing a plurality of data tuples of different versions, and the data tuples are mainly used for storing entity data and version information of different versions. The data tuple structure storing one row of data may contain one or more fields, for example, as shown in fig. 5, four fields are included, namely, a first data field (data), a second data field (ctid), a third data field (delFlag), and a fourth data field (xmin), wherein the second data field, the third data field, and the fourth data field are tuple headers, and the first data field is tuple content part. The value (value) of the first data field (data) is the main data of each version of the data tuple. In the example shown in fig. 5, the data value1 of the first version of the data tuple (with the insertion transaction number of 10) may be: account number-xxx 1, name-xx, amount-100 yuan \8230;. The value of the second data field (ctid) is: the storage position (storage page information and in-page tuple pointer subscript information) of the data tuple of the previous version of each version in the data table; the value of the third data field (delFlag) is a flag value, and is used for respectively representing whether the data tuples of each version are deleted or not. The value of the fourth data field (xmin) represents the operation transaction number of the respective version of the data tuple.
According to the embodiment of the disclosure, the data index of the database adopts a tree structure, and each leaf node stores an index tuple. The index tuple includes a first index field (key) and a second index field (ctid), and the first index field values of the index tuples are different from each other, so that the identifiers distinguish different index tuples, and in the example shown in fig. 5, the first index fields of the two indexes are different, namely key = v1 and key = v2. The value of the second index field is the storage location of the head-of-chain tuple in the head-of-chain table, and is used for locating the head-of-chain tuple from the head-of-chain table. And ctid in the index stores page information of the chain head tuple and subscript information of a tuple pointer in the page. For example, a second index field (ctid) of (0, 1) represents that the storage location of the chain head tuple in the chain head table is the 1 st group of page 0.
According to the embodiment of the disclosure, based on the data storage structure of the embodiment of the disclosure, by executing the data processing method of the embodiment of the disclosure, the data tuple of the latest version can be quickly queried.
Specifically, first, by the operation S401 described above, a target index tuple corresponding to the target query value is read from the index tuples in the index table by traversing the first index field of the index tuple (the first index field value of the target index tuple is the same as the target query value) based on the target query value. Since the second index field (ctid) in the index tuple can be used to locate the chain head tuple from the chain head table, further, the chain head tuple can be located based on the second index field (ctid) through operation S402. And because the value of the chain head field is the storage position of the data tuple of the latest version in the data table; the data tuple to the current latest version can be located from the data table based on the chain head field by operation S403.
For example, to query the latest version of data with account xxx1, a user may match a target index tuple corresponding to account xxx1 from multiple index tuples (where the first index field value v1= xxx1 of the target index tuple). Then, based on the second index field ctid = (0, 1), the head group is located (the storage location of the head group in the head table is the 1 st group of the 0 th page). Finally, according to the chain header field ctid = (2, 2), the data tuple from the data table to the current latest version (the storage location of which in the data table is the 2 nd group of the 2 nd page) is located.
According to the embodiment of the present disclosure, compared to the related art, the database storage structure of the embodiment of the present disclosure is optimized and improved because the second index field in the index tuple can be used to locate the chain head tuple from the chain head table, and because the value of the chain head field is the storage location of the data tuple of the latest version in the data table. Therefore, based on the data storage structure of the embodiment of the present disclosure, by executing the data processing method of the embodiment of the present disclosure, the latest version of the data tuple can be quickly queried.
Specifically, in the related art, since the index points to an older data version, when data query is performed, data of multiple versions need to be traversed sequentially to locate the latest data, and as the number of updates of the same record increases, the number of jumps of access to the data increases, and query becomes slow accordingly. Compared with the prior art, the data query method disclosed by the embodiment of the disclosure can locate the latest data only by one index and one data access no matter how many data versions are, and thus the processing efficiency of hot data query is greatly improved.
According to the method, on the basis of keeping the old version and the new version stored in the original database in a centralized manner, the problem that the query is slowed down along with the increase of the versions is solved by adjusting the data arrangement mode and the index storage mode of the old version and the new version, the problem that the query is slowed down along with the increase of the updated versions under the multi-version concurrent control mechanism of the database is solved, and the processing efficiency of hot data query is improved to a greater extent. Meanwhile, compared with a database (such as Oracle and Mysql) realized in a separate storage mode, the method still has the advantages of low IO (input/output) overhead of updating operation, quick rollback, no problem of overold UNDO snapshot and the like. In addition, the first-group of the chain is an independent page space, so that the page information is conveniently placed in a data cache region to reduce data access overhead, and excessive memory space occupation pressure can not be caused.
According to an embodiment of the present disclosure, in the index tuples, first index field values of the index tuples are different and used as query indexes, and based on the target query value, reading the target index tuple from at least one index tuple in the index table may specifically include: and traversing the index table, and reading an index tuple with the first index field value being the same as the target query value from at least one index tuple to serve as a target index tuple. For example, a user wants to query the latest version of data whose account is xxx1, and may match an index tuple having a first index field value v1 identical to a target query value xxx1 through the account to be queried (i.e., using the target query value) xxx1, as a target index tuple.
According to the embodiment of the disclosure, in the index tuples, the second index field values of the plurality of index tuples are the same and are storage positions of the chain head tuple in the chain head table, so that the chain head tuple can be located through the second index field of any index tuple.
According to the embodiment of the disclosure, by pointing the plurality of indexes to the chain head element group, in the process of executing data operation, the chain head element group can be positioned once no matter which index passes through, and further the data can be positioned based on the chain head element group, so that the efficiency of data processing is further improved.
According to the embodiment of the present disclosure, a data tuple structure storing one line of data includes a plurality of fields, and a value of a first data field is body data of each version data tuple. In the example shown in fig. 5, the values (values) of the first data fields (data) of multiple versions of the same record are respectively the main data of the data tuples of the versions, and the main data of the multiple versions are different from each other. For example, the user king xx opens an account in a certain system and stores 100 elements, the data value1 of the data tuple (with the insertion transaction number of 10) of the first version may be a set of data associated with the account xxx1, and specifically may be: account number-xxx 1, name-king xx, amount-100 yuan \8230; the data value2 of the data tuple (with the insertion transaction number of 12) of the second version may be data after updating the data of the first version, and is another set of data associated with the account number of xxx1, and specifically may be: after 80 elements of the account xxx are transferred out, the data is updated to account xxx1, name-king xx, amount-20 elements \8230 \8230anddata value4 of the data tuple (with the insertion transaction number of 15) of the fourth version can be data after updating the data of the third version, for example, the data can be related data after the original account xxx1 is updated to account xxx2 due to account change, and specifically can be: account number-xxx 2, name-wang xx, amount-120 yuan \8230; \8230.
According to the embodiment of the disclosure, the plurality of versions of the main data are different from each other, namely value1, value2, value3 \8230, and \8230, but may contain the same main key of data so as to establish an index according to the main key. For example, in the example shown in FIG. 5, multiple versions of a tuple of data all contain the same data key: a key (e.g., a key is an account number). However, the primary key values may be the same or different, for example, in the example shown in fig. 5, the primary key values of the data tuples of the first version, the second version, and the third version are the same and are all: key = v1 (e.g., account = xxx 1); the primary key value of the fourth version of the data tuple is key = v2 (e.g., account = xxx 2).
According to an embodiment of the present disclosure, since the data index is established according to the body data of the data tuple for indexing the data, the first index field of the index tuple matches with the body data (first data field) of the data tuple. In the index page of the database associated with the same set of multi-version data, one or more index tuples may be stored, and in the example shown in fig. 5, in the index page of the database associated with the same set of multi-version data, 2 index tuples are stored, two first index field values (key = v1, key = v 2) of the index tuples match with a plurality of data key values (key = v1, key = v 2) of the first data field (data) of the data tuples. After the data is updated, if the key value key changes, the index tuple corresponding to the key value key needs to be newly added.
According to an embodiment of the present disclosure, the data tuple structure comprises a plurality of fields, a second data field (ctid) having a value of: the storage location (storing page information, and intra-page tuple pointer index information) of the data tuple of the previous version of each version in the data table, so that the data tuples of the previous version can be respectively linked to by each second data field value. For example, in the example shown in fig. 5, the ctid of the tuple of the data of the first version is empty, indicating that it is the oldest tuple of the version of the data. The ctid of the second version of the data tuple is (0, 1) representing that the storage position of the first version of the data tuple in the data table is the 1 st group of the 0 th page. The ctid of the data tuple of the fourth version is (1, 1), and the storage position of the data tuple representing the third version in the data table is the 1 st group of the 1 st page. Therefore, when data query is executed, data of any version can be positioned according to the data link relation between the data of the new version and the data of the old version.
By setting the value of the second data field to the storage location of the tuple of data of the previous version of each version in the data table, the data can be pointed to from the old version to the new version in the related art (see fig. 1) or from the new version to the old version. Therefore, when data query is executed, the data version which is queried preferentially is the newer data version, and if the data of the older version needs to be queried, the query can be traversed from the newer version to the older version sequentially until the data version which needs to be queried is found. When data query is executed, a newer data version is queried in many cases, and the newer data version can be quickly located through the change of the storage mode, so that the method is suitable for most data query scenes, and the data query speed can be increased.
According to the embodiment of the disclosure, the data tuple structure includes a plurality of fields, where a value of a third data field (delFlag) is a flag value, and is used to respectively characterize whether the data tuple of each version is deleted, for example, delFlag is 0, indicating that the version data is not deleted, and delFlag is 1, indicating that the version data is deleted.
According to an embodiment of the present disclosure, the data tuple structure includes a plurality of fields, where a value of a fourth data field (xmin) represents an operation transaction number of a data tuple of each version, and xmin may be a transaction number inserted into the tuple, which may be understood as an effective transaction number of the tuple. The operation transaction numbers of the data tuples of the multiple versions can be different or partially the same. In the example shown in fig. 5, the operation transaction numbers xmin of the data tuples of the third version and the fourth version are the same, which represents that the same transaction performs two update operations on the data of the second version successively.
According to an embodiment of the present disclosure, the index tuple may further include a third index field (xmin), a fourth index field (xmax). The index tuple introduces xmin and xmax, which are respectively used for recording the insertion and deletion version numbers of the index to improve the index scanning access efficiency.
FIG. 6 schematically shows a flow diagram of a data processing method according to another embodiment of the present disclosure; FIG. 7 schematically illustrates a flow diagram of a data processing method according to yet another embodiment of the present disclosure; FIG. 8 schematically illustrates a schematic diagram of a database storage structure performing data manipulation according to an embodiment of the present disclosure. The data processing method according to the embodiment of the present disclosure performs various data operations, such as inserting, updating, deleting, querying, and the like, in detail with reference to fig. 6, fig. 7, and fig. 8, respectively.
A method for performing data insertion in the data processing method according to the embodiment of the present disclosure is described below with reference to fig. 8.
As shown in fig. 8, a new data with a data value of value1, that is, a first version of data, is inserted through a transaction operation with a transaction number of 10, where the current transaction number of the data tuple of the first version, that is, the fourth data field xmin, is 10, and the data value of the first data field is value1 (where the data primary key = v 1); the second data field ctid is empty, which indicates that the data is the oldest tuple of the version; the third data field delFlag has a value of 0 indicating that the version data has not been deleted.
Meanwhile, a new head-of-chain tuple is inserted, a head-of-chain field (ctid) has a value of the storage location of the first version of data tuple in the data table, and ctid = (0, 1), which represents that the storage location of the current latest version of data tuple in the data table is the 1 st set of the 0 th page.
Meanwhile, a new index tuple is inserted, a first index field (key = v 1) of the index tuple is the same as a data primary key of the current data tuple, a second index field ctid points to a chain head tuple position (ctid), and ctid = (0, 1) represents that the storage position of the chain head tuple in the chain head table is the 1 st group of the 0 th page. The third index field xmin is the transaction number 10 of the current inserted index, and the fourth index field xmax is 0, which indicates that there is no transaction with the deleted index.
A method for executing data update in the data processing method according to the embodiment of the present disclosure is described below with reference to fig. 6 and 8.
According to an embodiment of the present disclosure, as shown in fig. 6, the method of performing data update includes operations S601 to S603.
In operation S601, in a case that the first target data tuple needs to be updated, acquiring an exclusive lock of the head-of-chain tuple;
in operation S602, a second target data tuple is newly added to the data table, where the second target data tuple is a data tuple updated from the first target data tuple;
in operation S603, the value of the head-of-chain field is updated to the storage location of the second target data tuple in the data table.
The specific implementation method of the method can refer to the example shown in fig. 8, as shown in fig. 8, in the case of updating data for the first time, the exclusive lock of the head tuple of the chain needs to be acquired first, and in the case of operating the piece of data simultaneously by multiple transactions, a lock wait queue exists.
After the exclusive lock of the chain head tuple is acquired, the data tuple of the second version is inserted, i.e., the updated data tuple is inserted. Specifically, a piece of new data with a data value of value2, namely, the data of the second version, is inserted through the transaction operation with the transaction number of 12. Wherein, the current transaction number of the data tuple of the second version, i.e. the fourth data field xmin, is 12; a first data field data value of value2 (where data primary key = v 1); a second data field ctid = (0, 1), and the storage position of the data tuple representing the previous version in the data table is a 1 st group of the 0 th page; the third data field delFlag has a value of 0, indicating that the version data has not been deleted.
As shown in fig. 8, in the case of updating data for the second time, the exclusive lock of the head-of-chain tuple needs to be acquired first, and after the exclusive lock of the head-of-chain tuple is acquired, the new version data is inserted. The current update transaction is used to perform two successive updates on the previous version of data, so two new pieces of data are inserted successively through the same transaction: a third version of the tuple of data and a fourth version of the tuple of data. The current transaction numbers of the data tuples of the third version and the fourth version are the same, namely the xmin of the fourth data field is 15; the first data field data values of the third and fourth version data tuples are value3 and value4, respectively. In the data tuple of the fourth version, the key value of the data primary key is changed from key = v1 to key = v2. The second data fields ctid of the data tuples of the third and fourth versions point to the storage locations of the data tuple of the previous version, respectively. The value of the third data field delFlag of the third and fourth version data tuples is 0, indicating that the version data is not deleted.
According to the embodiment of the disclosure, based on the data storage structure of the embodiment of the disclosure, the head-of-chain tuple points to the data tuple of the latest version, so that when data operations such as data update are performed, only the exclusive lock of the head-of-chain tuple needs to be acquired for performing the data operations. Because the data update is usually performed on the latest version of data, in most cases, the latest data can be located to perform the data update only by once indexing, once locking and once data access through the data update method, and compared with the prior art in which the latest data can be located only by traversing multiple versions to lock, release a lock and query the multiple versions of data, the method in the embodiment of the disclosure solves the problem that the processing of the database update hotspot record is further slowed down due to traversing multiple versions to lock, and the processing efficiency of the hotspot data update is greatly improved.
According to the embodiment of the disclosure, since the data index is established according to the main data of the data tuple, the first index field of the index tuple playing the role of the index needs to be matched with the key value of the data primary key in the first data field data of the data tuple, and therefore, after the data version is updated, if the key value of the data primary key is updated, the index tuple corresponding to the key value needs to be newly added.
Specifically, in the process of executing data update, the index change method is as follows:
in the case that the key value of the data key of the second target data tuple (updated data) is the same as the key value of the data key of the first target data tuple (data before updating), the index tuple does not need to be updated;
and under the condition that the key value of the data primary key of the second target data tuple (updated data) is different from the key value of the data primary key of the first target data tuple (data before updating), adding an index tuple in the index table, wherein the value of the first index field of the added index tuple is the key value of the data primary key of the second target data tuple.
As shown in fig. 8, after the data is updated for the second time, the key value of the data primary key in the data tuple of the fourth version is changed from key = v1 to key = v2. And adding an index, wherein in the added index, the value of the first index field is the key value of the data primary key of the updated data tuple, that is, the first index field key = v2 is the same as the key value v2 of the data primary key of the data tuple of the fourth version. Meanwhile, updating the old index tuple xmax to be the current transaction number 15, inserting the new index tuple with xmin being the current transaction number 15 and xmax being 0, and pointing the second index field ctid = (0, 1) to the head tuple position in the new index tuple.
The above-described index updating method according to the embodiment of the present disclosure is different from that in the related art (see fig. 1) in that:
in the related art, there are two cases of triggering to perform data index update: 1. the key values of the data primary keys of the data tuples of the new version and the old version are different; 2. key values of data primary keys of the data tuples of the new version and the old version are the same, but the data tuples of the new version and the old version are not in the same page; both of the above cases trigger the insertion of a new index tuple, while the newly added index ctid points to the newly inserted data tuple. Therefore, under the conditions that the number of data versions is large and the number of data pages is frequently changed, the number of indexes is greatly increased, different indexes point to data of different versions, and when data operation is performed, since which index points to the latest version data cannot be known only according to the indexes, the data operation can be performed only by sequentially traversing multiple indexes, and the data processing efficiency is low.
In the index updating method of the embodiment of the disclosure, the index updating is triggered only when the key values of the data primary keys of the data tuples of the new version and the old version are different, but the data tuples of the new version and the old version do not trigger the index updating in the same page, and the second index field ctid of the index updating method points to the head tuple position of the chain no matter the index of the new version or the index of the old version.
Therefore, compared with the condition that the number of indexes is large and different indexes point to data of different versions in the related technology, the index updating method can reduce the number of indexes to a large extent, reduce the times of data indexing to a large extent and improve the efficiency of data indexing. When data operation is executed, the latest data can be positioned through the head group of the chain no matter which index is based on, the latest data can be positioned quickly to execute the data operation, and the hot spot data processing efficiency is further improved.
According to an embodiment of the present disclosure, the data tuple structure includes a plurality of fields, and a value of a third data field (delFlag) is used to characterize whether each version of the data tuple is deleted.
With reference to fig. 7 and fig. 8, a method for executing data deletion in the data processing method according to the embodiment of the present disclosure is described below based on the above data structure.
As shown in fig. 7, the method of performing data deletion includes operations S701 to S703.
In operation S701, under a condition that a third data field value of the second target data tuple is a first value and the second target data tuple needs to be deleted, acquiring an exclusive lock of the head-of-chain tuple, where the first value is used to represent that the second target data tuple is not deleted before the deletion operation is performed;
in operation S702, a third target data tuple is newly added in the data table, where a third data field value of the third target data tuple is a second value, and a first data field value of the third target data tuple is null, where the second value is used to indicate that the second target data tuple is deleted after the deletion operation is performed;
in operation S703, the value of the head of chain field is updated to the storage location of the third target data tuple in the data table.
The specific execution method of the above method can refer to the example shown in fig. 8, as shown in fig. 8, in the case of performing data deletion on the latest fourth version of data (transaction number 15), a head-of-chain exclusive lock needs to be acquired first, and in the case of operating the piece of data simultaneously by multiple transactions, a lock wait queue exists.
After the exclusive lock of the head-of-chain tuple is acquired, a group of null data is newly added in the data table, the operation transaction number of the data tuple is the current transaction number 18, namely the fourth data field xmin is 18; the data value of the first data field is null, which represents that the current version data is null data; a second data field ctid = (2, 1), which represents the storage location of the data tuple of the previous version in the data table as the 1 st group of the 2 nd page; the third data field delFlag has a value of 1 indicating that the version data has been deleted. Finally, the value of the head of chain field is updated to the storage location of the current data tuple in the data table, ctid = (2,2), and xmax of the index tuple is updated to the current transaction number 18.
Based on the data storage structure of the embodiment of the present disclosure, a method for executing a data query in the data processing method of the embodiment of the present disclosure is described below with reference to fig. 8.
As shown in fig. 8, when a data query is executed, first, a target index tuple corresponding to a target query value is read from index tuples in an index table by traversing a first index field key of the index tuple (the first index field value of the target index tuple is the same as the target query value). Further, based on the second index field ctid = (0, 1), the head of chain tuple is located. Then, based on the chain header field ctid = (2, 2), locate the data tuple of the current latest version from the data table, and the storage location in the data table is the 2 nd group of the 2 nd page.
The latest version of the tuple of data (transaction number 18) is read, delFlag is judged to be 1, which represents that the current record is deleted and the record is invisible, and the query of the record is finished.
If the data to be queried is not the current latest version data, the query can be sequentially traversed from the new version to the old version based on the data link relationship between the new version and the old version (the new version points to the old version) until the data version needing to be queried is found.
According to the embodiment of the disclosure, because the related art center has the technical problem that the hotspot data cannot be accurately positioned in the aspect of finding the hotspot data, the embodiment of the disclosure determines the hotspot record by the following method.
The method comprises the following steps: acquiring the length of a lock waiting queue of an exclusive lock of a chain head tuple; in the event that the length of the lock wait queue is greater than a preset threshold, the first target data tuple is determined to be hot-point data.
For example, the preset threshold is set to 5, at a certain time, 10 transactions concurrently acquire the lock of the chain head tuple, the length of the lock waiting queue is 10, and if the length is greater than the preset threshold, the group of data where the chain head tuple is located is considered as hot data. The information of the linked header group ctid, specific data, the time point when the waiting queue depth is greater than the threshold value, the total queue length, the maximum depth, the initial/maximum/final/average equal lock time and the like can be registered in an 'updating hotspot record table', the updating hotspot record can be accurately and effectively found through the table, and specific hotspot execution information can be obtained.
According to the embodiment of the disclosure, in the data operation mechanism in the related art (refer to fig. 1), since the number of indexes is large and different indexes point to different versions of data, when performing data operation, multiple different data need to be locked dispersedly, and for a case where the update rate of individual records in the table is high, the individual records are masked due to the existence of a large number of normal serial batch processing records, and hot data cannot be tracked accurately.
In contrast, according to the data storage structure of the embodiment of the present disclosure, the head-of-chain tuple points to the data tuple, so when data operations such as data update and deletion are performed, only the exclusive lock of the head-of-chain tuple needs to be acquired, and the exclusive lock of the data tuple does not need to be acquired. Compared with the prior art, the data processing method disclosed by the embodiment of the disclosure changes the original multi-version equal locking/locking process into the unified locking process for the chain head element group, and the chain head element group is directly related to the data, so that the heat of the chain head element group can directly reflect the heat of the data, and the depth of the lock waiting queue is directly reflected by the update heat of the data. The optimized database can be used for updating the locking mode, the updated hotspot record discovery is realized by monitoring the depth of the lock waiting queue, the updated hotspot record is accurately and effectively discovered, and specific hotspot execution information is acquired. Compared with other schemes for updating the hotspot record discovery from the application level, the method has the advantages of accurate identification, no need of user modification, wide applicability and strong universality, can effectively avoid the production problem, and improves the reliability of service processing.
Based on the data processing method, the disclosure also provides a data processing device. The apparatus will be described in detail below with reference to fig. 9.
Fig. 9 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 9, the data processing apparatus 900 of this embodiment includes a first reading module 901, a second reading module 902, and a third reading module 903.
The first reading module 901 is configured to read a target index tuple from at least one index tuple in an index table based on a target query value, where the index tuple includes a first index field and a second index field, at least one first index field value of the at least one index tuple is different from at least one first index field value of the at least one index tuple, so that identifiers are used to distinguish different index tuples, and the second index field is used to locate a chain head tuple from a chain head table;
a second reading module 902, configured to read a head-of-chain tuple from the head-of-chain table based on a second index field value of the target index tuple, where the head-of-chain tuple includes a head-of-chain field, and a value of the head-of-chain field is: the storage position of the data tuple of the latest version in the data table in a plurality of versions of data tuples in the data table;
a third reading module 903, configured to read a first target data tuple from the data table based on the head of chain field, where the first target data tuple is a current latest version of data tuple.
According to the embodiment of the present disclosure, the database storage structure of the embodiment of the present disclosure is optimized and improved, because the second index field in the index tuple can be used to locate the chain head tuple from the chain head table, and because the value of the chain head field is the storage location of the data tuple of the latest version in the data table. Therefore, based on the above data storage structure of the embodiment of the present disclosure, the target index tuple can be read from at least one index tuple in the index table by the above first reading module 901 of the embodiment of the present disclosure, the head-of-chain tuple can be read from the head-of-chain table by the second reading module 902, and the latest version of data tuple can be quickly queried by the third reading module 903. According to the data processing device 900 in the embodiment of the disclosure, on the basis of keeping the old and new versions stored in the original database in a centralized manner, by adjusting the data arrangement mode and the index storage mode of the old and new versions, the problem that the query is slowed down along with the increase of the versions is solved, the problem that the query is slowed down along with the increase of the updated versions under the multi-version concurrency control mechanism of the database is solved, and the processing efficiency of the hot data query is greatly improved.
According to an embodiment of the present disclosure, the first reading module 901 includes a reading unit, configured to read, from at least one index tuple, an index tuple whose first index field value is the same as the target query value as the target index tuple.
According to an embodiment of the present disclosure, wherein: at least one second index field value of at least one index tuple is a storage position of the chain head tuple in the chain head table, so that the chain head tuple can be positioned through the second index field of any index tuple.
According to the embodiment of the disclosure, the device further comprises a first obtaining module, a first adding module and a first updating module.
The first obtaining module is used for obtaining an exclusive lock of the head-of-chain tuple under the condition that the first target data tuple needs to be updated; the first adding module is used for newly adding a second target data tuple in the data table, wherein the second target data tuple is a data tuple updated on the first target data tuple; and the first updating module is used for updating the value of the chain head field to the storage position of the second target data tuple in the data table.
According to the embodiment of the disclosure, the device further comprises a second obtaining module and a determining module.
The second obtaining module is configured to obtain the length of a lock waiting queue of an exclusive lock of a first tuple of the chain; and the determining module is used for determining the first target data tuple as the hot point data under the condition that the length of the lock waiting queue is greater than a preset threshold value.
According to an embodiment of the present disclosure, wherein: the data tuple comprises a first data field, a plurality of first data field values of the data tuples of a plurality of versions, and main data of the data tuples of each version, wherein the main data of the plurality of versions are different and comprise the same data key.
According to an embodiment of the present disclosure, wherein: a plurality of data key values of the data primary key under a plurality of versions are the same or different; at least one first index field value of the at least one index tuple matches the plurality of data key values.
According to an embodiment of the present disclosure, the apparatus further includes a second adding module, configured to add an index tuple in the index table when a key value of the data primary key of the second target data tuple is different from a key value of the data primary key of the first target data tuple, where a value of a first index field of the added index tuple is the key value of the data primary key of the second target data tuple.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a second data field, and a plurality of second data field values of the data tuples of the plurality of versions are: a storage location of the data tuples of the previous version of the respective version in the data table, such that the data tuples of the previous version can be linked to, respectively, by the respective second data field value.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a third data field, wherein a plurality of third data field values of the data tuples of the plurality of versions are respectively used for characterizing whether the data tuples of the respective versions are deleted.
According to the embodiment of the disclosure, the device further comprises a third obtaining module, a third adding module and a second updating module.
The third obtaining module is configured to obtain an exclusive lock of a chain head tuple under a condition that a third data field value of a second target data tuple is a first value and the second target data tuple needs to be deleted, where the first value is used to indicate that the second target data tuple is not deleted before a deletion operation is performed; a third adding module, configured to add a third target data tuple in the data table, where a third data field value of the third target data tuple is a second value, and a first data field value of the third target data tuple is null, and the second value is used to indicate that the second target data tuple is deleted after the deleting operation is performed; and the second updating module is used for updating the value of the chain head field to the storage position of the third target data tuple in the data table.
According to an embodiment of the present disclosure, wherein: the data tuple further includes a fourth data field, where a plurality of fourth data field values of the data tuples of the plurality of versions are: the operation transaction number of the data tuple of each version.
According to the embodiment of the present disclosure, any plurality of the first reading module 901, the second reading module 902, and the third reading module 903 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first reading module 901, the second reading module 902, and the third reading module 903 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or implemented by a suitable combination of any several of them. Alternatively, at least one of the first read module 901, the second read module 902, and the third read module 903 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Fig. 10 schematically shows a block diagram of an electronic device adapted to implement a data processing method according to an embodiment of the present disclosure.
As shown in fig. 10, an electronic device 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are stored. The processor 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1002 and/or the RAM 1003. Note that the program may also be stored in one or more memories other than the ROM 1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 1000 may also include an input/output (I/O) interface 1005, input/output (I/O) interface 1005 also connected to bus 1004, according to an embodiment of the present disclosure. Electronic device 1000 may also include one or more of the following components connected to I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be embodied in the device/apparatus/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, a computer-readable storage medium may include ROM 1002 and/or RAM 1003 and/or one or more memories other than ROM 1002 and RAM 1003 as described above in accordance with embodiments of the present disclosure.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the data processing method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1001. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1009, and/or installed from the removable medium 1011. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (16)

1. A method of data processing, comprising:
reading a target index tuple from at least one index tuple in an index table based on a target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are distinguished by identification, and the second index field is used for positioning a head tuple from the head list;
reading the chain head tuple from the chain head table based on a second index field value of the target index tuple, wherein the chain head tuple comprises a chain head field, and the value of the chain head field is as follows: the storage position of the data tuple of the latest version in the data table in a plurality of versions of data tuples in the data table;
based on the chain head field, reading a first target data tuple from the data table, wherein the first target data tuple is a data tuple of a current latest version.
2. The method of claim 1, wherein the reading a target index tuple from at least one index tuple in an index table based on a target query value comprises:
reading an index tuple with the first index field value being the same as the target query value from the at least one index tuple as the target index tuple.
3. The method of claim 1, wherein:
at least one second index field value of the at least one index tuple is a storage position of the chain head tuple in the chain head table, so that the chain head tuple can be located through the second index field of any one index tuple.
4. The method of claim 1, further comprising:
acquiring an exclusive lock of the chain head tuple if the first target data tuple needs to be updated;
newly adding a second target data tuple in the data table, wherein the second target data tuple is a data tuple updated on the first target data tuple;
updating the value of the head of chain field to the storage location of the second target data tuple in the data table.
5. The method of claim 4, further comprising:
acquiring the length of a lock waiting queue of an exclusive lock of the chain head tuple;
determining the first target data tuple as hot-point data if the length of the lock wait queue is greater than a preset threshold.
6. The method of claim 4, wherein:
the data tuples comprise first data fields, a plurality of first data field values of the data tuples of the plurality of versions are respectively main data of the data tuples of each version, and the main data of the plurality of versions are different from each other but comprise the same data key.
7. The method of claim 6, wherein:
the data key values of the data key under multiple versions are the same or different;
at least one first index field value of at least one of the index tuples matches the plurality of data key values.
8. The method of claim 7, further comprising:
and under the condition that the key value of the data primary key of the second target data tuple is different from the key value of the data primary key of the first target data tuple, adding an index tuple in the index table, wherein the value of the first index field of the newly added index tuple is the key value of the data primary key of the second target data tuple.
9. The method of claim 6, wherein:
the data tuple further includes a second data field, and a plurality of second data field values of the data tuples of the plurality of versions are: a storage location of a data tuple of a previous version of each version in the data table, such that the data tuples of the previous version are respectively linkable to by each of the second data field values.
10. The method of claim 9, wherein:
the data tuple further includes a third data field, wherein a plurality of third data field values of the data tuples of the versions are respectively used for characterizing whether the data tuples of the versions are deleted.
11. The method of claim 9, further comprising:
acquiring an exclusive lock of the chain head tuple when a third data field value of the second target data tuple is a first value and the second target data tuple needs to be deleted, wherein the first value is used for representing that the second target data tuple is not deleted before a deletion operation is executed;
adding a third target data tuple in the data table, wherein a third data field value of the third target data tuple is a second value, and a first data field value of the third target data tuple is null, the second value being used to characterize that the second target data tuple is deleted after a delete operation is performed;
updating a value of the head of chain field to a storage location of the third target data tuple in the data table.
12. The method of claim 10, wherein:
the data tuple further includes a fourth data field, wherein a plurality of fourth data field values of the plurality of versions of data tuples are: the operation transaction number of the data tuple of each version.
13. A data processing apparatus comprising:
the first reading module is used for reading a target index tuple from at least one index tuple in an index table based on a target query value, wherein the index tuple comprises a first index field and a second index field, at least one first index field value of the at least one index tuple is different from one another, so that different index tuples are identified and distinguished, and the second index field is used for positioning a head tuple from the head of a chain table;
a second reading module, configured to read the head-of-chain tuple from the head-of-chain table based on a second index field value of the target index tuple, where the head-of-chain tuple includes a head-of-chain field, and a value of the head-of-chain field is: the storage position of the data tuple of the latest version in the data table in a plurality of versions of data tuples in the data table;
a third reading module, configured to read a first target data tuple from the data table based on the chain head field, where the first target data tuple is a data tuple of a current latest version.
14. An electronic device, comprising:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-12.
15. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 12.
16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 12.
CN202211359180.3A 2022-11-01 2022-11-01 Data processing method and device, electronic equipment and computer readable storage medium Pending CN115687351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211359180.3A CN115687351A (en) 2022-11-01 2022-11-01 Data processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211359180.3A CN115687351A (en) 2022-11-01 2022-11-01 Data processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115687351A true CN115687351A (en) 2023-02-03

Family

ID=85048649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211359180.3A Pending CN115687351A (en) 2022-11-01 2022-11-01 Data processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115687351A (en)

Similar Documents

Publication Publication Date Title
US11314716B2 (en) Atomic processing of compound database transactions that modify a metadata entity
US11003689B2 (en) Distributed database transaction protocol
US11681684B2 (en) Client-driven commit of distributed write transactions in a database environment
US11188577B2 (en) Distributed transaction management with tokens
US7617254B2 (en) Method and mechanism for relational access of recovery logs in a database system
US7917502B2 (en) Optimized collection of just-in-time statistics for database query optimization
EP2797013B1 (en) Database update execution according to power management schemes
US8037040B2 (en) Generating continuous query notifications
US8117174B2 (en) Database system providing high performance database versioning
US10437688B2 (en) Enhancing consistent read performance for in-memory databases
US9477609B2 (en) Enhanced transactional cache with bulk operation
US20120203797A1 (en) Enhanced control to users to populate a cache in a database system
US20170329836A1 (en) Database transfer of changes
CN108108486B (en) Data table query method and device, terminal equipment and storage medium
US9390111B2 (en) Database insert with deferred materialization
WO2021109710A1 (en) Method and system for detecting and resolving write conflict
CN115687351A (en) Data processing method and device, electronic equipment and computer readable storage medium
US20220058179A1 (en) Executing database transactions
US11314728B1 (en) Range deletes on a distributed database
US20230315715A1 (en) Utilizing a structured audit log for improving accuracy and efficiency of database auditing
WO2023219733A1 (en) Maintaining transactional consistency in columnar engine
CN116414846A (en) Transaction processing method, device, equipment and storage medium based on database connection
CN116414917A (en) Data transmission method, device, equipment and storage medium based on Myhouse database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination