CN113296899A - Transaction master machine, transaction slave machine and transaction processing method based on distributed system - Google Patents

Transaction master machine, transaction slave machine and transaction processing method based on distributed system Download PDF

Info

Publication number
CN113296899A
CN113296899A CN202110625851.5A CN202110625851A CN113296899A CN 113296899 A CN113296899 A CN 113296899A CN 202110625851 A CN202110625851 A CN 202110625851A CN 113296899 A CN113296899 A CN 113296899A
Authority
CN
China
Prior art keywords
slave
transaction
memory
operation sequence
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625851.5A
Other languages
Chinese (zh)
Inventor
蔡云龙
刘新春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202110625851.5A priority Critical patent/CN113296899A/en
Publication of CN113296899A publication Critical patent/CN113296899A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • G06F13/287Multiplexed DMA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4406Loading of operating system
    • G06F9/441Multiboot arrangements, i.e. selecting an operating system to be loaded
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • G06F9/467Transactional memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a transaction master machine, a transaction slave machine and a transaction processing method based on a distributed system. The first memory of the transaction host is used for storing the slave file descriptor transparently transmitted from the transaction slave; the first central processing unit is used for generating a slave machine operation sequence according to the slave machine file descriptor and writing the slave machine operation sequence into the first memory; the first network card is used for reading the slave operation sequence from the first memory, transmitting the slave operation sequence to the second memory of the transaction slave, and loading the slave operation sequence by the second DMA controller of the transaction slave so that the second external memory controller of the transaction slave executes the slave operation sequence, thereby bypassing the central processing unit of the transaction slave. When data in an external memory of the transaction slave machine is migrated, the central processing unit of the transaction slave machine does not need to be switched between contexts because the central processing unit of the transaction slave machine does not need to participate, so that the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved.

Description

Transaction master machine, transaction slave machine and transaction processing method based on distributed system
Technical Field
The invention relates to the technical field of data processing, in particular to a transaction master machine, a transaction slave machine and a transaction processing method based on a distributed system.
Background
With the diversification of storage systems, various characteristics required by different applications can be provided. Since the types of services are diverse, it is difficult to achieve full coverage. Typically, storage systems use three performance metrics: IOPS (Input/Output Operations Per Second), Throughput, and Latency to evaluate a storage system. Typically, Input/Output (IO) intensive applications use throughput metrics to evaluate system performance; the small IO uses the IOPS index to evaluate system performance. The role of the delay indicator is important in many critical services. An expert in a blog specifies that credit card processors do not slow down fraud prevention or withdrawal authorization only with delays less than 4 ms. Latency is a mandatory component in core finance, security class applications.
Services based On-Line Transaction Processing (OLTP) and On-Line Analytical Processing (OLAP) have very high requirements for latency. The key to achieving high performance is the combination of high IOPS and low latency. When the storage system provides a higher IOPS, the latency of the single IO should not increase too much. Distributed network storage environments, however, have many limitations. In a local traditional storage system, the time delay has natural advantages due to the short IO path. But the distributed mode is adopted, namely after the expansion is carried out by the aid of multiple hosts, the network, remote data synchronization, service flow interruption and the like cannot be avoided. So that one IO is processed on multiple replica servers through the network, and these operations will increase the delay of the IO. The latency problem is also exacerbated by multiple transactions conducting synchronous traffic with each other. The synchronization service may cause context switching of a Central Processing Unit (CPU), which may cause a cache flush problem. This is fatal to the traffic that is very delay sensitive. The distributed storage copy mode has great influence on the performance of the database, and the difference of the average transaction processing capacity can reach 10 times.
In the aspect of distributed extended read-write performance, a shared disk or a network DMA (Direct Memory Access) technology and the like can be adopted at present, but data transmission needs a CPU to participate. At present, the usability and performance expansion are commonly improved in a database system, and a shared storage system can be adopted. Such as a RAC (Real Application Clusters), database cluster. The data disk is globally available. Because data is synchronized across CPUs, disk IO needs to interrupt the CPU's current task and is very time consuming, presenting a performance expansion problem. In practical application, a network DMA technology can also be adopted, so that a network DMA controller can be directly used in the remote data handling process, and a plurality of CPU cycles are saved. However, the data of the database and the instant game application need to be landed as soon as possible and persisted, and even if the distributed data sharing is performed by adopting the network DMA, the data is irrelevant to the current task (the data of the host computer when a multi-copy of the distributed system and a database read-write separation mode are performed), the current transaction can be frequently interrupted. These additional task switches can cause unnecessary delays and reduce system performance.
At present, the delay performance can be improved from multiple aspects, a bypass operation system is adopted, namely, the network DMA is utilized to directly transmit the remote data to the local memory, the data acquisition delay can be greatly reduced, the utilization rate of a CPU can be well reduced, and the parallel capability of the system is improved. However, after the data arrives at the designated memory, the CPU still needs to be interrupted to perform necessary operations. Current other tasks that cause CPU processing may be interrupted, context switched, and may even cause cache evictions to the next level or flushing, resulting in performance degradation.
Disclosure of Invention
The invention provides a transaction host, a transaction slave and a transaction processing method based on a distributed system, which are used for reducing time delay and improving the performance of the distributed system.
In a first aspect, the present invention provides a transaction master based on a distributed system, where the distributed system further includes at least a transaction slave. The transaction host comprises a first central processing unit, a first memory and a first network card. The first memory is used for storing a slave file descriptor transparently transmitted by the transaction slave; the first central processing unit is used for generating a slave machine operation sequence according to the slave machine file descriptor and writing the slave machine operation sequence into the first memory; the first network card is used for reading the slave operation sequence from the first memory, transmitting the slave operation sequence to the second memory of the transaction slave, and loading the slave operation sequence by the second DMA controller of the transaction slave so that the second external memory controller of the transaction slave executes the slave operation sequence.
In the scheme, the file descriptor of the transaction slave is transmitted to the transaction master in advance, and the transaction master generates a slave operation sequence executed by the transaction slave according to the slave file descriptor; and then transmitting the slave operation sequence to the transaction slave, and loading the slave operation sequence by a second DMA controller of the transaction slave so as to execute the second external memory controller of the transaction slave, thereby bypassing the central processing unit of the transaction slave. When data in an external memory of the transaction slave machine is migrated, the central processing unit of the transaction slave machine does not need to be switched between contexts because the central processing unit of the transaction slave machine does not need to participate, so that the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as multi-copy transmission transaction of a distributed system, data stream backup transaction in a database read-write separation mode, and the like, the controller of the first central processing unit of the transaction master and the controller of the non-second central processing unit of the transaction slave can complete corresponding operations. The second central processing unit of the transaction slave machine does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in a Non Uniform Memory Access (NUMA) system is solved.
In a specific embodiment, the first central processing unit is further configured to generate data to be stored, and write the data to be stored in the first memory; the first network card is also used for reading data to be stored from the first memory and transmitting the data to be stored to the second memory; the slave operation sequence comprises an operation sequence for writing data to be stored into a second external memory of the transaction slave, so that when the second DMA controller loads the slave operation sequence, the second DMA controller can be controlled to write the data to be stored into the second external memory from the second internal memory. When the transaction host needs to remotely transmit data to be stored to the transaction slave and write a disk in the transaction slave, a first central processing unit of the transaction host generates a slave operation sequence containing disk writing operation in advance, and the slave operation sequence is loaded by a second DMA controller of the transaction slave to control a second external memory controller to write the disk. In the process, a central processing unit of the transaction slave machine does not need to participate, so that the central processing unit of the transaction slave machine does not need to perform context switching, the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved.
In a particular embodiment, the transactional host further includes a first DMA controller, a first external memory, and a first external memory controller. The first memory also stores a host file descriptor of the transaction host; the first central processing unit is also used for generating a host operation sequence for writing the data to be stored into the first external memory according to the host file descriptor and writing the host operation sequence into the first internal memory; the first DMA controller is provided with a first command sequence loader, and the first command sequence loader is used for loading a host operation sequence so that the first external memory controller writes the data to be stored into the first external memory from the first internal memory. After the central processing unit of the transaction host generates the data to be stored, a host operation sequence for landing the data to be stored locally may be generated at the same time, a first command sequence loader is set in the first DMA controller, and the host operation sequence is loaded by the first command sequence loader, so that the first external memory controller executes the operation of landing the data to the local first external memory. In the process, the central processing unit of the transaction host generates a host operation sequence without participating in the disk falling, so that the central processing unit of the transaction host does not need to perform context switching, the time delay of the transaction host is reduced, and the performance of a distributed system is improved.
In a particular embodiment, the transactional host further comprises a first DMA controller and a first external memory. The second external memory of the transaction slave machine is stored with data to be read, which is required to be read by the transaction master machine. The slave operation sequence comprises an operation sequence for transmitting data to be read to the second memory, so that when the second DMA controller loads the slave operation sequence, the second external memory controller is controlled to transmit the data to be read from the second external memory to the second memory. The slave computer operation sequence also comprises an operation sequence for transmitting the data to be read to the first memory, so that the data to be read is read by the second network card of the transaction slave computer and is transmitted to the first memory through the first network card. The first memory also stores host file descriptors of the transaction host; the first central processing unit is also used for generating a host operating sequence for writing the data to be read into the first external memory according to the host file descriptor, and writing the host operating sequence into the first internal memory. The first DMA controller is provided with a first command sequence loader, and the first command sequence loader is used for loading a host operating sequence, so that when the first command sequence loader loads the host operating sequence, data to be read is written into the first external memory. When the transaction host needs to remotely read data to be read stored in the second external memory of the transaction slave, the slave operation sequence generated by the transaction host comprises an operation sequence for reading a disk of the transaction slave, and the operation sequence is loaded by the second DMA controller of the transaction slave to control the second external memory controller to execute. In the process, a central processing unit of the transaction slave machine does not need to participate, so that the central processing unit of the transaction slave machine does not need to perform context switching, the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved.
In one embodiment, the transaction host further includes a first root bridge device, and a first interrupt extraction and redirection module. The first interrupt extraction and redirection module is used for intercepting an interrupt signal with a set identifier submitted by the first root bridge device after transmitting the host operation sequence to the first memory, and redirecting to the first command sequence loader. The host operation sequence which can be loaded and executed by the first command sequence loader is convenient to timely intercept and capture, so that a central processing unit of the transaction host is not required to participate in the disk dropping, the central processing unit of the transaction host is not required to perform context switching, the time delay of the transaction host is reduced, and the performance of a distributed system is improved.
In a second aspect, the present invention further provides a transaction slave based on a distributed system, where the distributed system further includes at least a transaction master. The transaction slave machine comprises a second central processing unit, a second memory, a second network card, a second external memory controller and a second DMA controller. Wherein the second memory is used for storing a slave file descriptor of the transaction slave. The second network card is used for reading the slave file descriptor and transmitting the slave file descriptor to the first memory of the transaction host; the second network card is also used for receiving the slave operation sequence transmitted by the transaction host and transmitting the slave operation sequence to the second memory. A second DMA controller is provided with a second command sequence loader, and the second command sequence loader is used for loading a slave operation sequence to enable the second external memory controller to execute the slave operation sequence.
In the scheme, the file descriptor of the transaction slave is transmitted to the transaction master in advance, and the transaction master generates a slave operation sequence executed by the transaction slave according to the slave file descriptor; and then transmitting the slave operation sequence to the transaction slave, setting a second command sequence loader in a second DMA controller, and loading the slave operation sequence by the second command sequence loader to enable a second external memory controller of the transaction slave to execute so as to bypass the central processing unit of the transaction slave. When data in an external memory of the transaction slave machine is migrated, the central processing unit of the transaction slave machine does not need to be switched between contexts because the central processing unit of the transaction slave machine does not need to participate, so that the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as multi-copy transmission transaction of a distributed system, data stream backup transaction in a database read-write separation mode, and the like, the controller of the first central processing unit of the transaction master and the controller of the non-second central processing unit of the transaction slave can complete corresponding operations. The second central processing unit of the transaction slave machine does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of the distributed system sensitive to the time delay is improved, the requirement on low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
In a particular embodiment, the transactional slave further comprises a second external memory. The slave operation sequence comprises an operation sequence for writing the data to be stored transmitted by the transaction host into the second external memory, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller can be controlled to write the data to be stored into the second external memory from the second internal memory. When the transaction host needs to remotely transmit data to be stored to the transaction slave and write a disk in the transaction slave, the first central processing unit of the transaction host generates a slave operation sequence containing disk writing operation in advance, and the slave operation sequence is loaded by the second command sequence loader to control the second external memory controller to write the disk. In the process, a central processing unit of the transaction slave machine does not need to participate, so that the central processing unit of the transaction slave machine does not need to perform context switching, the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved.
In a specific embodiment, the transaction slave further includes a second external memory, and the second external memory further stores data to be read, which is required to be read by the transaction master. The slave operation sequence also comprises an operation sequence for transmitting the data to be read to the second memory, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller can be controlled to transmit the data to be read from the second external memory to the second memory. The slave operation sequence also comprises an operation sequence for transmitting the data to be read to a first memory of the transaction host, so that when the second command sequence loader loads the slave operation sequence, the second network card reads the data to be read in the second memory and transmits the data to the first memory through the first network card of the transaction host. When the transaction host needs to remotely read data to be read stored in the second external memory of the transaction slave, the slave operation sequence generated by the transaction host comprises an operation sequence for reading a disk of the transaction slave, and the operation sequence is loaded by the second command sequence loader and controls the second external memory controller to execute. In the process, a central processing unit of the transaction slave machine does not need to participate, so that the central processing unit of the transaction slave machine does not need to perform context switching, the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved.
In one embodiment, the transaction slave further includes a second bridge device and a second interrupt extraction and redirection module. The second interrupt extraction and redirection module is used for intercepting an interrupt signal with a set identifier submitted by a second root bridge device and redirecting to a second command sequence loader after the second network card transmits the slave operation sequence to the second memory. The slave operation sequence which can be loaded and executed by the second command sequence loader can be conveniently and timely intercepted, so that a central processing unit of the slave needing transaction does not need to participate in disk reading and disk dropping, the central processing unit of the slave needing transaction does not need to switch contexts, the time delay of the slave needing transaction is reduced, and the performance of a distributed system is improved.
In a third aspect, the present invention further provides a distributed system, where the distributed system includes at least two hosts, and the at least two hosts include a transaction host and a transaction slave. The transaction host comprises a first central processing unit, a first memory, a first network card and a first DMA controller; the transaction slave machine comprises a second central processing unit, a second memory, a second network card, a second DMA controller and a second external memory controller. Wherein the second memory is used for storing a slave file descriptor of the transaction slave. The second network card is used for reading the slave file descriptor and transmitting the slave file descriptor to the first memory. The first central processing unit is used for generating a slave operation sequence according to the slave file descriptor and writing the slave operation sequence into the first memory. The first network card is used for reading the slave operation sequence from the first memory and transmitting the slave operation sequence to the second memory. The second network card is also used for receiving the slave operation sequence transmitted by the first network card and transmitting the slave operation sequence to the second memory. A second DMA controller is provided with a second command sequence loader, and the second command sequence loader is used for loading a slave operation sequence to enable the second external memory controller to execute the slave operation sequence.
In the scheme, a file descriptor of the transaction slave is transmitted to the transaction master in advance, and the transaction master generates a slave operation sequence executed by the transaction slave according to the slave file descriptor; and then transmitting the slave operation sequence to the transaction slave, setting a second command sequence loader in a second DMA controller, and loading the slave operation sequence by the second command sequence loader to enable a second external memory controller of the transaction slave to execute so as to bypass the central processing unit of the transaction slave. When data in an external memory of the transaction slave machine is migrated, the central processing unit of the transaction slave machine does not need to be switched between contexts because the central processing unit of the transaction slave machine does not need to participate, so that the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as multi-copy transmission transaction of a distributed system, data stream backup transaction in a database read-write separation mode, and the like, the controller of the first central processing unit of the transaction master and the controller of the non-second central processing unit of the transaction slave can complete corresponding operations. The second central processing unit of the transaction slave machine does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of the distributed system sensitive to the time delay is improved, the requirement on low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
In a fourth aspect, the present invention further provides a transaction processing method based on the distributed system. The distributed system at least comprises a transaction host and a transaction slave; the transaction host comprises a first central processing unit, a first memory, a first network card and a first DMA controller; the transaction slave machine comprises a second central processing unit, a second memory, a second network card, a second DMA controller and a second external memory controller. The transaction processing method comprises the following steps: the second network card reads the slave file descriptor in the second memory and transmits the slave file descriptor to the first memory; the first central processing unit generates a slave operation sequence according to the slave file descriptor and writes the slave operation sequence into the first memory; the first network card reads the slave operation sequence from the first memory and transmits the slave operation sequence to the second memory; a second command sequence loader within the second DMA controller loads a sequence of slave operations causing the second external memory controller to execute the sequence of slave operations.
In the scheme, the file descriptor of the transaction slave is transmitted to the transaction master in advance, and the transaction master generates a slave operation sequence executed by the transaction slave according to the slave file descriptor; and then transmitting the slave operation sequence to the transaction slave, setting a second command sequence loader in a second DMA controller, and loading the slave operation sequence by the second command sequence loader to enable a second external memory controller of the transaction slave to execute so as to bypass the central processing unit of the transaction slave. When data in an external memory of the transaction slave machine is migrated, the central processing unit of the transaction slave machine does not need to be switched between contexts because the central processing unit of the transaction slave machine does not need to participate, so that the time delay of the transaction slave machine is reduced, and the performance of a distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as multi-copy transmission transaction of a distributed system, data stream backup transaction in a database read-write separation mode, and the like, the controller of the first central processing unit of the transaction master and the controller of the non-second central processing unit of the transaction slave can complete corresponding operations. The second central processing unit of the transaction slave machine does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of the distributed system sensitive to the time delay is improved, the requirement on low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
Drawings
Fig. 1 is a block diagram of a distributed system according to an embodiment of the present invention;
fig. 2 is a block diagram of a flow structure when a transaction master and a transaction slave perform transaction processing according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a process of remotely reading data to be read in a transaction slave by a transaction master according to an embodiment of the present invention;
fig. 4 is a flowchart of a transaction master remotely transmitting data to a transaction slave and writing data to a disk in the transaction slave according to an embodiment of the present invention;
fig. 5 is a flowchart of a transaction processing method based on a distributed system according to an embodiment of the present invention.
Reference numerals:
10-transaction host 11-first central processing unit 12-first memory 13-first network card
14-first DMA controller 15-first external memory controller
16-first interrupt fetch and redirect module 17-first shared file parameter memory area
20-transaction slave 21-second central processing unit 22-second memory 23-second network card
24-second DMA controller 25-second external memory controller
26-second interrupt fetch and redirect Module 27-second shared File parameter memory area
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For convenience of understanding the transaction host based on the distributed system provided by the embodiment of the present invention, an application scenario of the transaction host based on the distributed system provided by the embodiment of the present invention is first described below, where the transaction host is applied to the distributed system, and referring to fig. 1, the distributed system includes at least two hosts, specifically, the number of the hosts in the distributed system may be any number of not less than 2, such as 2, 3, 4, 10, 20, and the like. Each host comprises basic hardware such as a central processing unit, an internal memory, a network card, a DMA controller, an external memory controller and the like. The external memory may be a magnetic disk, a hard disk, or other types of memory, and the external memory controller is a controller of the corresponding type of external memory. For example, when the external memory is a disk, the external memory controller is a disk controller; when the external memory is a solid-state disk, the external memory controller is a solid-state disk controller. The at least two masters include a transaction master 10 and a transaction slave 20. It should be noted that, the transaction master 10 and the transaction slave 20 are divided according to the transaction master 10 that makes data requirements during processing a specific transaction, and the transaction slave 20 that provides data requests. In the course of processing different transactions, depending on the roles they assume, they may be used as transaction master 10 in one transaction process and as transaction slave 20 in another transaction process. Of course, if during a transaction, a host in the distributed system does not participate in the transaction, it belongs to other hosts in the distributed system. It should be noted that the transactions in this category mainly refer to data destaging, remote data reading, data moving in a memory, and the like, and do not relate to data processing. For example, a transaction of multi-copy transmission in a distributed system, a data stream backup transaction in a database read-write separation mode, and the like may be controlled by another controller other than the central processor to complete a corresponding operation. In addition, for convenience of description, the hardware in the transaction master 10 is described as the "first" type of hardware, and the hardware in the transaction slave 20 is described as the "second" type of hardware, that is, the classification manner of the "first" and the "second" is not limited to the different hardware types in a certain master, but is used as a division manner for distinguishing the different hardware positions in the transaction master 10 and the transaction slave 20, so the "first" and the "second" defined in the following should not be taken as limiting features for limiting the protection scope of the claimed solution of the present application. The distributed system based transaction host 10 will be described in detail with reference to the accompanying drawings.
Referring to fig. 1 and fig. 2, a transaction host 10 based on a distributed system according to an embodiment of the present invention includes a first central processing unit 11, a first memory 12, and a first network card 13. The first Memory 12 may be a Random Access Memory (RAM) for temporarily storing the operation data in the first central processing unit 11, and is used as a bridge for data exchange with external memories such as a magnetic disk and a solid-state disk.
In the process of performing the transaction operation, referring to fig. 1 and fig. 2, the first memory 12 is configured to store a slave file descriptor transmitted from the transaction slave 20, and the slave file descriptor is used as a file descriptor of the transaction slave 20 and can access the file in the transaction slave 20 through the file descriptor. I.e. at the beginning of the transaction, the transaction slave 20 will pass its own slave file descriptor through to the transaction master 10. The specific implementation of transparent transmission may be realized by using a first network card 13 in the transaction master 10 and a second network card 23 in the transaction slave 20 to transmit the slave file descriptor. Specifically, referring to fig. 2, the slave file descriptor is initially stored in the second memory 22 of the transaction slave 20, and the second network card 23 of the transaction slave 20 reads the slave file descriptor in the second memory 22 and transmits the slave file descriptor to the first network card 13. Received by the first network card 13 and stored in the first memory 12.
Next, referring to fig. 1 and 2, the first cpu 11 is configured to generate a slave operation sequence according to the slave file descriptor. Specifically, the transaction master 10 may act on a storage file of the transaction slave 20 (the storage file is virtually provided by a file descriptor), and form a slave Operation Sequence (Operation Sequence slave, abbreviated as OPSS, where "Operation Sequence S" in the figure represents a slave Operation Sequence). That is, when the first central processing unit 11 of the transaction master 10 remotely controls the transaction operation in the transaction slave 20, it may directly generate a slave operation sequence that enables the controller of the non-central processing unit in the transaction slave 20 to be loaded and executed according to the slave file descriptor transmitted from the transaction slave 20, so that when the subsequent transaction slave 20 receives the slave operation sequence, the second central processing unit 21 of the transaction slave 20 may be bypassed, and the controller of the non-second central processing unit 21 in the transaction slave 20 is used to load and execute, thereby avoiding the context switch of the second central processing unit 21. When the slave operation sequence is transmitted to the transaction slave 20, the slave operation sequence needs to be written into the first memory 12 first, then the first network card 13 reads the slave operation sequence from the first memory 12, and transmits the slave operation sequence to the second memory 22 of the transaction slave 20. That is, the slave operation sequence is stored in the first memory 12, and then transmitted to the second network card 23 by the first network card 13, and received by the second network card 23 and transmitted to the second memory 22 of the transaction slave 20, so as to complete the operation of transmitting the slave operation sequence to the transaction slave 20. Specifically, the slave operation sequence is sent to the execution command sequence buffer in the second memory 22 of the transaction slave 20.
Referring to fig. 1 and 2, after the transaction slave 20 receives the slave operation sequence, the second DMA controller 24 of the transaction slave 20 loads the slave operation sequence, so that the second external memory controller 25 of the transaction slave 20 executes the slave operation sequence. That is, the second DMA controller 24 in the transaction slave 20 loads the slave operation sequence, and cooperates with the second external memory controller 25 of the transaction slave 20 to complete operations such as DATA migration and backup in the external memory of the transaction slave 20 (DATA 00 and DATA1 in the figure indicate different DATA types). Generating a slave operation sequence executed by the transaction slave 20 by the transaction master 10 according to the slave file descriptor by transmitting the file descriptor of the transaction slave 20 to the transaction master 10 in advance; the slave operation sequence is then transferred to the transaction slave 20, and the second DMA controller 24 of the transaction slave 20 loads the slave operation sequence to cause the second external memory controller 25 of the transaction slave 20 to execute, thereby bypassing the central processing unit of the transaction slave 20. When data in the external memory of the transaction slave 20 is migrated, the central processing unit of the transaction slave 20 does not need to perform context switching because the central processing unit of the transaction slave 20 does not need to participate, so that the time delay of the transaction slave 20 is reduced, and the performance of the distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as a multi-copy transmission transaction of a distributed system, a data stream backup transaction in a database read-write separation mode, and the like, the first central processing unit 11 of the transaction master 10 and the controller of the transaction slave 20, which is not the second central processing unit 21, may complete corresponding operations. The second central processing unit 21 of the transaction slave 20 does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
To facilitate an understanding of how a transaction proceeds from start to finish execution, the following describes in this section how the transaction slave 20 loads and executes a sequence of slave operations using a controller other than the second central processor 21. It should be understood that the manner in which the transaction slave 20 is explained below how to load and execute the slave operation sequence using a controller other than the second central processor 21 is not intended as a limitation on the transaction master 10. Referring to fig. 1 and 2, a second command sequence loader is provided in the second DMA controller 24, and is configured to read a slave operation sequence in the second memory 22 and load the slave operation sequence, so that the second external memory controller 25 executes the slave operation sequence. By modifying the second DMA controller 24, a second command sequence loader is added therein, and the second command sequence loader can pick up the slave operation sequence in the second memory 22 and configure a corresponding register for loading, so that the second command sequence loader completes the operation of loading the operation sequence of transferring data, which was originally completed by the second central processing unit 21, to the second memory 22 or to the internal buffer of the second network card 23. In a specific implementation, the second command sequence loader attaches a slave operation sequence (scatter/gather table) transmitted from the remote transaction master 10 to the local scatter/gather table from the fixed location of the second memory 22, sets a corresponding register sequence, and then starts transmission for independent disk reading and writing operations supported by the second cpu 21 of the non-transaction slave 20.
The operation mode of the transaction master 10 for remotely reading data from the transaction slave 20 is described below with reference to fig. 3. At this time, the data to be read, which is required to be read by the transaction host 10, is also stored in the second external memory. The slave operation sequence generated by the transaction master 10 further includes an operation sequence for transferring the data to be read to the second memory 22, so that when the second DMA controller 24 loads the slave operation sequence, the second external memory controller 25 is controlled to transfer the data to be read from the second external memory to the second memory 22. That is, the data to be read is transmitted from the second external storage to the second internal storage 22, so that the subsequent second network card 23 can read the data to be read, and transmit the data to the transaction host 10. In a specific implementation, the slave operation sequence further includes an operation sequence for transmitting data to be read to the first memory 12, so that when the second DMA controller 24 loads the slave operation sequence, the transaction slave 20 reads the data to be read from the second network card 23 and transmits the data to be read to the first memory 12 through the first network card 13. So that the first central processing unit 11 of the transaction host 10 can directly read the data to be read from the first memory 12. When the transaction master 10 needs to remotely read data to be read stored in the second external memory of the transaction slave 20, the slave operation sequence generated by the transaction master 10 includes an operation sequence for reading a disk of the transaction slave 20, and the slave operation sequence is loaded by the second DMA controller 24 of the transaction slave 20 to control the second external memory controller to execute. In the process, the central processing unit of the transaction slave 20 does not need to participate, so that the central processing unit of the transaction slave 20 does not need to perform context switching, the time delay of the transaction slave 20 is reduced, and the performance of a distributed system is improved.
In addition, the data to be read can be landed locally, so that when the data to be read is used again by the first central processing unit 11, the data can be directly called from the local external memory, remote calling is not needed, and time delay is reduced. In this process, the transaction host 10 further includes a first DMA controller 14, a first external memory, and a first external memory controller 15. Also stored in the first memory 12 is a host file descriptor for the transaction host 10. The first central processing unit 11 can generate a host Operation Sequence (Operation Sequence Master, abbreviated as OPSM, in the figure, "Operation Sequence M" represents a host Operation Sequence, and correspondingly, "Operation Sequence M & S" represents a host Operation Sequence and a slave Operation Sequence) for writing the data to be read into the first external storage according to the host file descriptor, and write the host Operation Sequence into the first internal storage 12. Wherein the master operational sequence may be generated with the slave operational sequence. A first command sequence loader may be provided in the first DMA controller 14 for loading a sequence of host operations to write data to be read to the first external memory. By adopting the mode, after the transaction host 10 reads data from a remote place, the local disk-dropping operation can be completed by loading the host operation sequence through the first DMA controller 14 without the central processing unit of the transaction host 10 participating in the disk-dropping, so that the central processing unit of the transaction host 10 does not need to perform context switching, the time delay of the transaction host 10 is reduced, and the performance of a distributed system is improved. In a specific implementation, the first command sequence loader appends a host operation sequence (scatter/gather table) to the local scatter/gather table, sets a corresponding register sequence, and then initiates a transfer for independent disk reading, disk writing, etc. operations without the support of the first central processor 11 of the transaction host 10.
In the application process, it is often necessary for the transaction master 10 to remotely store the data to be stored generated by itself in the second external memory of the transaction slave 20, and the following describes a common transaction master 10 with reference to fig. 4 to remotely transmit the data to be stored generated by itself to the transaction slave 20, and perform a disk writing transaction operation in the transaction slave 20. Specifically, referring to fig. 1 and fig. 2, during the data processing process of the first central processing unit 11, data to be stored is generated and written into the first memory 12. Then, the first network card 13 reads the data to be stored from the first memory 12, and transmits the data to be stored to the second memory 22. The specific transmission method is the same as the transmission method from the first network card 13 to the second network card 23 in the slave operation sequence. Therefore, when the first central processing unit 11 generates the data to be stored, the slave operation sequence is generated together, the slave operation sequence and the data to be stored are transmitted to the second network card 23 together, and the data to be stored and the slave operation sequence are stored in the second memory 22 together by the second network card 23. In this case, the slave operation sequence includes an operation sequence for writing data to be stored into the second external memory of the transaction slave 20, and when the second DMA controller 24 loads the slave operation sequence, the second external memory controller 25 can be controlled to write the data to be stored into the second external memory from the second memory 22. Specifically, the data to be stored is first transmitted from the second memory 22 to the second external memory controller 25, and then written into the second external memory by the second external memory controller 25, thereby completing the data destaging operation. In this way, when the transaction master 10 needs to remotely transmit data to be stored to the transaction slave 20 and write a disk in the transaction slave 20, the first central processing unit 11 of the transaction master 10 generates a slave operation sequence including a disk writing operation in advance, and the slave operation sequence is loaded by the second DMA controller 24 of the transaction slave 20 to control the second external memory controller 25 to perform the disk writing operation. In the process, the central processing unit of the transaction slave 20 does not need to participate, so that the central processing unit of the transaction slave 20 does not need to perform context switching, the time delay of the transaction slave 20 is reduced, and the performance of a distributed system is improved.
In addition, in the process that the transaction master 10 remotely transmits the data to be stored to the transaction slave 20, the data to be stored can be written into a local external memory, and the local disk dropping is completed. In specific implementation, referring to fig. 1 and fig. 2, the host operation sequence of the first central processing unit 11 includes an operation sequence for writing data to be stored into the first external storage, and when the first command sequence loader loads the main operation sequence, the first external storage controller 15 can write the data to be stored into the first external storage from the first internal storage 12. Specifically, the data to be stored in the first memory 12 is first transmitted to the first external memory controller 15, and then the first external memory controller 15 writes the data to be stored in the first external memory, thereby completing the data destaging. The master operation sequence may be generated together with the slave operation sequence, that is, after the central processing unit of the transaction master 10 generates the data to be stored, the master operation sequence for landing the data to be stored locally may be generated at the same time. The first external memory controller 15 can be caused to perform a landing operation to the local first external memory when the first command sequence loader loads the host operation sequence. In the process, the central processing unit of the transaction host 10 generates a host operation sequence without the central processing unit of the transaction host 10 participating in the disk dropping, so that the central processing unit of the transaction host 10 does not need to perform context switching, the time delay of the transaction host 10 is reduced, and the performance of a distributed system is improved.
When the first DMA controller 14 specifically bypasses the first central processing unit 11 to load the host operation sequence, referring to fig. 1 and fig. 2, the first DMA controller may be implemented by providing the first interrupt retrieving and redirecting module 16. After the host operation sequence is transmitted to the first memory 12, the first interrupt retrieving and redirecting module 16 intercepts an interrupt signal with a set identifier submitted by the first root bridge device and redirects the interrupt signal to the first command sequence loader. The host operation sequence which can be loaded and executed by the first command sequence loader is convenient to intercept in time, so that a central processing unit of the transaction host 10 does not need to participate in the disk dropping, the central processing unit of the transaction host 10 does not need to perform context switching, the time delay of the transaction host 10 is reduced, and the performance of a distributed system is improved. Specifically, after the first network card 13 completes data transmission, an event completion interrupt is issued. This type of interrupt is typically done at the Root bridge device (RC) for PCIe, which extracts the interrupt signal based on the MSI/MSI-x submitted by the device and submits it to the bus. The first interrupt extraction and redirection module 16 is responsible for separating the interrupt signal with the set identifier from the interrupt signal after the first network card 13 completes data transmission according to the current transmission mode, and redirecting to the first command sequence loader. Specifically, the MSI/MSI-X interrupt signal is extracted at the first root bridge device, and when the interrupt signal is extracted at the first root bridge device, it is determined whether the current DMA operation performed by the first network card 13 is in the multi-machine bypass mode. If so, the interrupt signal is directed to the first interrupt fetch and redirect module 16, controlled by the first DMA controller 14. The interrupt information carries an interrupt number of the first network card 13, and the first DMA controller 14 can locate the control information and the data DMA operation descriptor sent by other devices in the first memory 12 according to the interrupt number. The first DMA controller 14 initiates execution of the first command sequence loader to automatically complete the operation indicated by the host operation sequence. A Sharable remote or independent disk write identifier (SA, in the figure, SA0 indicates an SA set on the transaction host) may be set in the first DMA controller 14 to recognize the interrupt signal with the setting identifier. Since the transaction master 10 and the transaction slave 20 are equivalent underlying interfaces from the viewpoint of the storage operation in the data Copy mode of the transaction master 10 and the transaction slave 20, the SA is set to inform the current states of the operating systems of the transaction master 10 and the transaction slave 20.
It should be additionally noted that the foregoing embodiment in which the transaction master 10 remotely reads data to be read from the transaction slave 20 and the embodiment in which the transaction master 10 remotely stores data to be stored generated by itself to the transaction slave 20 belong to two independent embodiments, and the two embodiments may be performed simultaneously, separately, or only one of the two embodiments.
Referring to fig. 2, a first Shared File parameter Memory area 17 (SFDM for short) may also be defined in the first Memory 12, where the first Shared File parameter Memory area 17 is used to store File descriptors and various data required during the transaction operation. The first central processing unit 11, the first network card 13, the first DMA controller 14, and the first external memory controller 15 can all access the first shared file parameter memory area 17 to reduce data duplication, increase operation speed, and reduce time delay.
Generating a slave operation sequence executed by the transaction slave 20 by the transaction master 10 according to the slave file descriptor by transmitting the file descriptor of the transaction slave 20 to the transaction master 10 in advance; the slave operation sequence is then transferred to the transaction slave 20, and the second DMA controller 24 of the transaction slave 20 loads the slave operation sequence to cause the second external memory controller 25 of the transaction slave 20 to execute, thereby bypassing the central processing unit of the transaction slave 20. When data in the external memory of the transaction slave 20 is migrated, the central processing unit of the transaction slave 20 does not need to perform context switching because the central processing unit of the transaction slave 20 does not need to participate, so that the time delay of the transaction slave 20 is reduced, and the performance of the distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as a multi-copy transmission transaction of a distributed system, a data stream backup transaction in a database read-write separation mode, and the like, the first central processing unit 11 of the transaction master 10 and the controller of the transaction slave 20, which is not the second central processing unit 21, may complete corresponding operations. The second central processing unit 21 of the transaction slave 20 does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
In addition, an embodiment of the present invention further provides a transaction slave 20 based on a distributed system, and referring to fig. 1 and fig. 2, the transaction slave 20 includes a second central processing unit 21, a second memory 22, a second network card 23, a second external memory controller 25, and a second DMA controller 24. The second Memory 22 may also be a Random Access Memory (RAM) for temporarily storing the operation data in the second central processing unit 21, and is used as a bridge for data exchange with external memories such as a magnetic disk and a solid-state disk. The second memory 22 is used for storing slave file descriptors of the transaction slave 20. Referring to fig. 1 and 2, the second network card 23 is configured to read the slave file descriptor in the second memory 22 and pass the slave file descriptor through to the first memory 12 of the transaction host 10. The way of implementing the transparent transmission refers to the foregoing description about the transaction host 10, and is not described herein again. The second network card 23 is further configured to receive a slave operation sequence transmitted by the transaction master 10, and transmit the slave operation sequence to the second memory 22. And a second command sequence loader is provided within the second DMA controller 24 for loading a slave operation sequence to cause the second external memory controller 25 to execute the slave operation sequence.
Generating a slave operation sequence executed by the transaction slave 20 by the transaction master 10 according to the slave file descriptor by transmitting the file descriptor of the transaction slave 20 to the transaction master 10 in advance; and then the slave operation sequence is transmitted to the transaction slave 20, a second command sequence loader is arranged in the second DMA controller 24, the slave operation sequence is loaded by the second command sequence loader, and the second external memory controller 25 of the transaction slave 20 executes the slave operation sequence, so that the central processing unit of the transaction slave 20 is bypassed. When data in the external memory of the transaction slave 20 is migrated, the central processing unit of the transaction slave 20 does not need to perform context switching because the central processing unit of the transaction slave 20 does not need to participate, so that the time delay of the transaction slave 20 is reduced, and the performance of the distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as a multi-copy transmission transaction of a distributed system, a data stream backup transaction in a database read-write separation mode, and the like, the first central processing unit 11 of the transaction master 10 and the controller of the transaction slave 20, which is not the second central processing unit 21, may complete corresponding operations. The second central processing unit 21 of the transaction slave 20 does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
When the second DMA controller 24 specifically bypasses the second central processing unit 21 to load the slave operation sequence, referring to fig. 1 and fig. 2, the second interrupt fetch and redirect module 26 may be provided. After the second network card 23 transmits the slave operation sequence to the second memory 22, the second interrupt retrieving and redirecting module 26 intercepts an interrupt signal with a set identifier submitted by the second bridge device and redirects the interrupt signal to the second command sequence loader. The slave operation sequence which can be loaded and executed by the second command sequence loader can be conveniently and timely intercepted, so that a central processing unit of the slave 20 which needs transaction does not need to participate in disk reading, disk writing, data migration and the like, the second central processing unit 21 of the slave 20 does not need to perform context switching, the time delay of the slave 20 is reduced, and the performance of a distributed system is improved. Specifically, the second network card 23 may send an event completion interrupt after completing data transmission. This type of interrupt is typically done at the Root bridge device (RC) for PCIe, which extracts the interrupt signal based on the MSI/MSI-x submitted by the device and submits it to the bus. The second interrupt retrieving and redirecting module 26 is responsible for separating the interrupt signal with the set identifier from the interrupt signal after the second network card 23 completes data transmission according to the current transmission mode, and redirecting to the second command sequence loader. Specifically, the MSI/MSI-X interrupt signal is extracted at the second bridge device, and when the interrupt signal is extracted at the second bridge device, it is determined whether the current DMA operation performed by the second network card 23 is in the multi-machine bypass mode. If so, the interrupt signal is directed to the second interrupt fetch and redirect module 26 for control by the second DMA controller 24. The interrupt information carries an interrupt number of the second network card 23, and the second DMA controller 24 can locate the control information and the data DMA operation descriptor sent by other devices in the second memory 22 according to the interrupt number. The second DMA controller 24 initiates execution of the second command sequence loader to automatically complete the operation indicated by the slave operation sequence. A Sharable remote or independent disk-writing identifier (SA, in the figure, SA1 indicates an SA set on the transaction slave) may be set in the second DMA controller 24 to recognize the interrupt signal with the setting identifier. Since the transaction master 10 and the transaction slave 20 are equivalent underlying interfaces from the viewpoint of the storage operation in the data Copy mode of the transaction master 10 and the transaction slave 20, the SA is set to inform the current states of the operating systems of the transaction master 10 and the transaction slave 20.
When the transaction master 10 remotely reads the data to be read, the slave operation sequence further includes an operation sequence for transmitting the data to be read to the second memory 22, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller 25 can be controlled to transmit the data to be read from the second external memory to the second memory 22. That is, the data to be read in the second external memory is first transmitted to the second external memory controller 25, and the second external memory controller 25 transmits the data to be read to the second internal memory 22. At this time, the slave operation sequence further includes an operation sequence for transmitting the data to be read to the first memory 12 of the transaction host 10, so that when the second command sequence loader loads the slave operation sequence, the second network card 23 reads the data to be read in the second memory 22 and transmits the data to be read to the first memory 12 through the first network card 13 of the transaction host 10. The generation and transmission manner of the slave operation sequence refer to the foregoing description about the transaction master 10, and are not described herein again. In this way, when the transaction master 10 needs to remotely read data to be read stored in the second external memory of the transaction slave 20, the slave operation sequence generated by the transaction master 10 includes an operation sequence for reading a disk of the transaction slave 20, and the slave operation sequence is loaded by the second command sequence loader and controls the second external memory controller to execute the operation sequence. In the process, the central processing unit of the transaction slave 20 does not need to participate, so that the central processing unit of the transaction slave 20 does not need to perform context switching, the time delay of the transaction slave 20 is reduced, and the performance of a distributed system is improved.
In addition, in the application process, there are also cases where the transaction host 10 stores the generated data to be stored remotely into the second external memory. The transaction master 10 shown in the foregoing transmits the data to be stored generated by itself to the transaction slave 20 remotely, and when the transaction slave 20 performs a disk writing transaction operation. Referring to fig. 1 and 2, the transaction slave 20 further includes a second external memory. At this time, the slave operation sequence includes an operation sequence for writing the data to be stored transmitted from the transaction master 10 into the second external memory, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller 25 can be controlled to write the data to be stored into the second external memory from the second memory 22. The manner how the transaction master 10 generates the slave operation sequence and how the slave operation sequence is transmitted to the second memory 22 of the transaction slave 20 is referred to the above description of the transaction master 10, and is not repeated herein. When the data to be stored is written into the second external memory, the slave operation sequence is loaded by the second command sequence loader, and the second external memory controller 25 is controlled to write the data to be stored into the second external memory from the second internal memory 22. Specifically, the data to be stored is first transmitted from the second memory 22 to the second external memory controller 25, and then written into the second external memory by the second external memory controller 25, thereby completing the data destaging operation. In this way, when the transaction master 10 needs to remotely transmit data to be stored to the transaction slave 20 and write a disk in the transaction slave 20, the first central processing unit 11 of the transaction master 10 generates a slave operation sequence including a disk writing operation in advance, and the slave operation sequence is loaded by the second command sequence loader to control the second external memory controller 25 to perform the disk writing operation. In the process, the central processing unit of the transaction slave 20 does not need to participate, so that the central processing unit of the transaction slave 20 does not need to perform context switching, the time delay of the transaction slave 20 is reduced, and the performance of a distributed system is improved.
It should be additionally noted that the foregoing embodiment in which the transaction master 10 remotely reads data to be read from the transaction slave 20 and the embodiment in which the transaction master 10 remotely stores data to be stored generated by itself to the transaction slave 20 belong to two independent embodiments, and the two embodiments may be performed simultaneously, separately, or only one of the two embodiments.
Referring to fig. 2, a second Shared File parameter Memory area 27 (SFDM for short) may also be defined in the second Memory 22, where the second Shared File parameter Memory area 27 is used to store a slave operation sequence and various data required in the transaction operation process. The second central processing unit 21, the second network card 23, the second DMA controller 24, and the second external memory controller 25 all have access to the second shared file parameter memory area 27 to reduce data duplication, increase operation speed, and reduce time delay.
In addition, an embodiment of the present invention further provides a distributed system, and referring to fig. 1 and fig. 2, the distributed system includes at least two hosts, where the at least two hosts include a transaction host 10 and a transaction slave 20. The transaction host 10 includes a first central processing unit 11, a first memory 12, a first network card 13 and a first DMA controller 14; the transaction slave 20 includes a second central processing unit 21, a second memory 22, a second network card 23, a second DMA controller 24, and a second external memory controller 25. Wherein, the second memory 22 is used for storing the slave file descriptor of the transaction slave 20. The second network card 23 is used for reading the slave file descriptor and transmitting the slave file descriptor to the first memory 12. The first central processing unit 11 is configured to generate a slave operation sequence according to the slave file descriptor, and write the slave operation sequence into the first memory 12. The first network card 13 is configured to read a slave operation sequence from the first memory 12 and transmit the slave operation sequence to the second memory 22. And the second network card 23 is further configured to receive the slave operation sequence transmitted by the first network card 13, and transmit the slave operation sequence to the second memory 22. A second command sequence loader is provided within the second DMA controller 24 for loading a sequence of slave operations to cause the second external memory controller 25 to execute the sequence of slave operations.
Generating a slave operation sequence executed by the transaction slave 20 by the transaction master 10 according to the slave file descriptor by transmitting the file descriptor of the transaction slave 20 to the transaction master 10 in advance; and then the slave operation sequence is transmitted to the transaction slave 20, a second command sequence loader is arranged in the second DMA controller 24, the slave operation sequence is loaded by the second command sequence loader, and the second external memory controller 25 of the transaction slave 20 executes the slave operation sequence, so that the central processing unit of the transaction slave 20 is bypassed. When data in the external memory of the transaction slave 20 is migrated, the central processing unit of the transaction slave 20 does not need to perform context switching because the central processing unit of the transaction slave 20 does not need to participate, so that the time delay of the transaction slave 20 is reduced, and the performance of the distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as a multi-copy transmission transaction of a distributed system, a data stream backup transaction in a database read-write separation mode, and the like, the first central processing unit 11 of the transaction master 10 and the controller of the transaction slave 20, which is not the second central processing unit 21, may complete corresponding operations. The second central processing unit 21 of the transaction slave 20 does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
The specific configuration and operation of each hardware refer to the foregoing description about the transaction master 10 and the transaction slave 20, and are not described herein again. It should be noted that each of the operation and function modules described above with respect to the transaction master 10 and the transaction slave 20 can be applied to the distributed system claimed in the embodiment of the present invention.
Furthermore, the embodiment of the invention also provides a transaction processing method based on the distributed system. Referring to fig. 1 and 2, the distributed system at least includes a transaction master 10 and a transaction slave 20; the transaction host 10 includes a first central processing unit 11, a first memory 12, a first network card 13 and a first DMA controller 14; the transaction slave 20 includes a second central processing unit 21, a second memory 22, a second network card 23, a second DMA controller 24, and a second external memory controller 25. Referring to fig. 5, the transaction processing method includes:
step 10: the second network card 23 reads the slave file descriptor of the second memory 22 and transmits the slave file descriptor to the first memory 12;
step 20: the first central processing unit 11 generates a slave operation sequence according to the slave file descriptor, and writes the slave operation sequence into the first memory 12;
step 30: the first network card 13 reads the slave operation sequence from the first memory 12 and transmits the slave operation sequence to the second memory 22;
step 40: the second command sequence loader within the second DMA controller 24 loads the sequence of slave operations causing the second external memory controller 25 to execute the sequence of slave operations.
In the above solution, by passing the file descriptor of the transaction slave 20 to the transaction master 10 in advance, the transaction master 10 generates a slave operation sequence executed by the transaction slave 20 according to the slave file descriptor; and then the slave operation sequence is transmitted to the transaction slave 20, a second command sequence loader is arranged in the second DMA controller 24, the slave operation sequence is loaded by the second command sequence loader, and the second external memory controller 25 of the transaction slave 20 executes the slave operation sequence, so that the central processing unit of the transaction slave 20 is bypassed. When data in the external memory of the transaction slave 20 is migrated, the central processing unit of the transaction slave 20 does not need to perform context switching because the central processing unit of the transaction slave 20 does not need to participate, so that the time delay of the transaction slave 20 is reduced, and the performance of the distributed system is improved. That is, in the present application, when the transaction related to data dropping, data movement in the memory, and the like, which does not involve data processing, such as a multi-copy transmission transaction of a distributed system, a data stream backup transaction in a database read-write separation mode, and the like, the first central processing unit 11 of the transaction master 10 and the controller of the transaction slave 20, which is not the second central processing unit 21, may complete corresponding operations. The second central processing unit 21 of the transaction slave 20 does not participate or participates in the operation of the transactions as little as possible, the time delay is reduced, the overall performance of a time delay sensitive distributed system is improved, the requirement for low time delay in the high-speed communication process is met, and the problem of more serious time delay in the NUMA system is solved.
The detailed operation of each step refers to the foregoing description about the transaction master 10 and the transaction slave 20, and is not described herein again. It should be noted that each of the operations described above with respect to the transaction master 10 and the transaction slave 20 can be applied to the transaction processing method based on the distributed system as claimed in the embodiment of the present invention.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (11)

1. A transaction master based on a distributed system, wherein the distributed system further comprises at least a transaction slave, the transaction master comprising:
the first memory is used for storing the slave file descriptor transparently transmitted by the transaction slave;
the first central processing unit is used for generating a slave operation sequence according to the slave file descriptor and writing the slave operation sequence into the first memory;
and the first network card is used for reading the slave operation sequence from the first memory, transmitting the slave operation sequence to a second memory of the transaction slave, and loading the slave operation sequence by a second DMA controller of the transaction slave so as to enable a second external memory controller of the transaction slave to execute the slave operation sequence.
2. The transaction host of claim 1, wherein the first central processing unit is further configured to generate data to be stored and write the data to be stored into the first memory;
the first network card is also used for reading the data to be stored from the first memory and transmitting the data to be stored to the second memory;
the slave operation sequence comprises an operation sequence for writing the data to be stored into a second external memory of the transaction slave, so that when the second DMA controller loads the slave operation sequence, the second external memory controller can be controlled to write the data to be stored into the second external memory from the second internal memory.
3. The transactional host of claim 2, further comprising a first DMA controller, a first external memory, and a first external memory controller;
wherein, the first memory also stores host file descriptor of the transaction host;
the first central processing unit is further configured to generate a host operation sequence for writing the data to be stored into the first external storage according to the host file descriptor, and write the host operation sequence into the first internal storage;
the first DMA controller is provided with a first command sequence loader, and the first command sequence loader is used for loading the host operation sequence so that the first external memory controller writes the data to be stored into the first external memory from the first internal memory.
4. The transaction host of claim 1, further comprising a first DMA controller and a first external memory;
the second external memory of the transaction slave machine stores the data to be read, which needs to be read by the transaction master machine;
the slave operation sequence comprises an operation sequence for transmitting the data to be read to the second memory, so that when the second DMA controller loads the slave operation sequence, the second external memory controller is controlled to transmit the data to be read from the second external memory to the second memory;
the slave operation sequence also comprises an operation sequence for transmitting the data to be read to the first memory, so that when the second DMA controller loads the slave operation sequence, a second network card of the transaction slave is controlled to read the data to be read, and the data to be read is transmitted to the first memory through the first network card;
the first memory also stores a host file descriptor of the transaction host; the first central processing unit is further configured to generate a host operation sequence for writing the data to be read into the first external storage according to the host file descriptor, and write the host operation sequence into the first internal storage;
the first DMA controller is provided with a first command sequence loader, and the first command sequence loader is used for loading the host operation sequence, so that the data to be read is written into the first external memory from the first internal memory when the host operation sequence is loaded by the first command sequence loader.
5. The transaction host of claim 3 or 4, further comprising a first root bridge device, and a first interrupt extraction and redirection module;
the first interrupt extraction and redirection module is configured to intercept an interrupt signal with a set identifier submitted by the first root bridge device after transmitting the host operation sequence to the first memory, and redirect the interrupt signal to the first command sequence loader.
6. A transaction slave based on a distributed system, wherein the distributed system further comprises at least a transaction master, characterized in that the transaction slave comprises:
a second central processing unit;
the second memory is used for storing a slave file descriptor of the transaction slave;
the second network card is used for reading the slave file descriptor and transmitting the slave file descriptor to the first memory of the transaction host; the slave computer is also used for receiving the slave computer operation sequence transmitted by the transaction host computer and transmitting the slave computer operation sequence to the second memory;
a second external memory controller;
and a second DMA controller, in which a second command sequence loader is arranged, and the second command sequence loader is used for loading the slave operation sequence, so that the second external memory controller executes the slave operation sequence.
7. The transactional slave of claim 6, further comprising a second external memory;
the slave operation sequence comprises an operation sequence for writing the data to be stored transmitted by the transaction host into the second external memory, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller can be controlled to write the data to be stored into the second external memory from the second memory.
8. The transaction slave of claim 6, further comprising a second external memory, wherein the second external memory stores data to be read, which is required to be read by the transaction master;
the slave operation sequence comprises an operation sequence for transmitting the data to be read to the second memory, so that when the second command sequence loader loads the slave operation sequence, the second external memory controller can be controlled to transmit the data to be read from the second external memory to the second memory;
the slave operation sequence also comprises an operation sequence of a first memory for transmitting the data to be read to the transaction host, so that when the second command sequence loader loads the slave operation sequence, the second network card reads the data to be read in the second memory and transmits the data to the first memory through the first network card of the transaction host.
9. The transaction slave of claim 6, further comprising a second bridge device, and a second interrupt extraction and redirection module;
the second interrupt extraction and redirection module is configured to intercept an interrupt signal with a set identifier submitted by the second root bridge device and redirect the interrupt signal to the second command sequence loader after the second network card transmits the slave operation sequence to the second memory.
10. A distributed system is characterized by comprising at least two hosts, wherein the at least two hosts comprise a transaction host and a transaction slave; the transaction host comprises a first central processing unit, a first memory, a first network card and a first DMA controller; the transaction slave machine comprises a second central processing unit, a second memory, a second network card, a second DMA controller and a second external memory controller;
wherein the second memory is used for storing a slave file descriptor of the transaction slave;
the second network card is used for reading the slave file descriptor and transmitting the slave file descriptor to the first memory;
the first central processing unit is used for generating a slave machine operation sequence according to the slave machine file descriptor and writing the slave machine operation sequence into the first memory;
the first network card is used for reading the slave machine operation sequence from the first memory and transmitting the slave machine operation sequence to the second memory;
the second network card is also used for receiving a slave machine operation sequence transmitted by the first network card and transmitting the slave machine operation sequence to the second memory;
and a second command sequence loader is arranged in the second DMA controller and used for loading the slave operation sequence so as to enable the second external memory controller to execute the slave operation sequence.
11. A transaction processing method based on a distributed system is disclosed, wherein the distributed system at least comprises a transaction master and a transaction slave; the transaction host comprises a first central processing unit, a first memory, a first network card and a first DMA controller; the transaction slave machine comprises a second central processing unit, a second memory, a second network card, a second DMA controller and a second external memory controller; the transaction processing method is characterized by comprising the following steps:
the second network card reads the slave file descriptor in the second memory and transmits the slave file descriptor to the first memory;
the first central processing unit generates a slave machine operation sequence according to the slave machine file descriptor and writes the slave machine operation sequence into the first memory;
the first network card reads the slave operation sequence from the first memory and transmits the slave operation sequence to the second memory;
a second command sequence loader within the second DMA controller loads the sequence of slave operations causing the second external memory controller to execute the sequence of slave operations.
CN202110625851.5A 2021-06-04 2021-06-04 Transaction master machine, transaction slave machine and transaction processing method based on distributed system Pending CN113296899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625851.5A CN113296899A (en) 2021-06-04 2021-06-04 Transaction master machine, transaction slave machine and transaction processing method based on distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625851.5A CN113296899A (en) 2021-06-04 2021-06-04 Transaction master machine, transaction slave machine and transaction processing method based on distributed system

Publications (1)

Publication Number Publication Date
CN113296899A true CN113296899A (en) 2021-08-24

Family

ID=77327197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625851.5A Pending CN113296899A (en) 2021-06-04 2021-06-04 Transaction master machine, transaction slave machine and transaction processing method based on distributed system

Country Status (1)

Country Link
CN (1) CN113296899A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401218A (en) * 2021-12-28 2022-04-26 绿盟科技集团股份有限公司 Bypass forwarding method and device for data message

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713164A (en) * 2005-07-21 2005-12-28 复旦大学 DMA controller and data transmission with multi-transaction discretionary process
US20060095604A1 (en) * 2004-10-28 2006-05-04 Edirisooriya Samantha J Implementing bufferless DMA controllers using split transactions
US20080140878A1 (en) * 2006-12-08 2008-06-12 Ofer Bar-Shalom System and method for peripheral device communications
CN109213631A (en) * 2018-08-22 2019-01-15 郑州云海信息技术有限公司 A kind of transaction methods, device, equipment and readable storage medium storing program for executing
CN110045912A (en) * 2018-01-16 2019-07-23 华为技术有限公司 Data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060095604A1 (en) * 2004-10-28 2006-05-04 Edirisooriya Samantha J Implementing bufferless DMA controllers using split transactions
CN1713164A (en) * 2005-07-21 2005-12-28 复旦大学 DMA controller and data transmission with multi-transaction discretionary process
US20080140878A1 (en) * 2006-12-08 2008-06-12 Ofer Bar-Shalom System and method for peripheral device communications
CN110045912A (en) * 2018-01-16 2019-07-23 华为技术有限公司 Data processing method and device
CN109213631A (en) * 2018-08-22 2019-01-15 郑州云海信息技术有限公司 A kind of transaction methods, device, equipment and readable storage medium storing program for executing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401218A (en) * 2021-12-28 2022-04-26 绿盟科技集团股份有限公司 Bypass forwarding method and device for data message
CN114401218B (en) * 2021-12-28 2023-07-21 绿盟科技集团股份有限公司 Bypass forwarding method and device for data message

Similar Documents

Publication Publication Date Title
JP3694273B2 (en) Data processing system having multipath I / O request mechanism
US10922135B2 (en) Dynamic multitasking for distributed storage systems by detecting events for triggering a context switch
EP0130349B1 (en) A method for the replacement of blocks of information and its use in a data processing system
US6636908B1 (en) I/O system supporting extended functions and method therefor
US7664909B2 (en) Method and apparatus for a shared I/O serial ATA controller
US6735636B1 (en) Device, system, and method of intelligently splitting information in an I/O system
EP0254960B1 (en) A multiprocessor system
EP1466255B1 (en) Supercharge message exchanger
US5335327A (en) External memory control techniques with multiprocessors improving the throughput of data between a hierarchically upper processing unit and an external memory with efficient use of a cache memory
US5201053A (en) Dynamic polling of devices for nonsynchronous channel connection
CN106066890B (en) Distributed high-performance database all-in-one machine system
CN104541244A (en) Methods and systems for performing a replay execution
US20060206663A1 (en) Disk array device and shared memory device thereof, and control program and control method of disk array device
CN102207886A (en) Virtual machine fast emulation assist
CN110119304B (en) Interrupt processing method and device and server
CN105681402A (en) Distributed high speed database integration system based on PCIe flash memory card
US20130212336A1 (en) Method and Apparatus for Memory Write Performance Optimization in Architectures with Out-of-Order Read/Request-for-Ownership Response
US7007126B2 (en) Accessing a primary bus messaging unit from a secondary bus through a PCI bridge
CN113296899A (en) Transaction master machine, transaction slave machine and transaction processing method based on distributed system
CN114153634B (en) Inter-process communication system and operation platform based on domestic Loongson processor
US20060277326A1 (en) Data transfer system and method
CN111611104B (en) InfluxDB data backup method, system and terminal equipment
CN114661239A (en) Data interaction system and method based on NVME hard disk
CN104391763B (en) Many-core processor fault-tolerance approach based on device view redundancy
US6907454B1 (en) Data processing system with master and slave processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination