WO2024060228A1 - 获取数据的方法、装置、系统及存储介质 - Google Patents

获取数据的方法、装置、系统及存储介质 Download PDF

Info

Publication number
WO2024060228A1
WO2024060228A1 PCT/CN2022/120986 CN2022120986W WO2024060228A1 WO 2024060228 A1 WO2024060228 A1 WO 2024060228A1 CN 2022120986 W CN2022120986 W CN 2022120986W WO 2024060228 A1 WO2024060228 A1 WO 2024060228A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
execution unit
memory
memory page
data
Prior art date
Application number
PCT/CN2022/120986
Other languages
English (en)
French (fr)
Inventor
魏星达
陈榕
王天下
陈海波
张旭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/120986 priority Critical patent/WO2024060228A1/zh
Publication of WO2024060228A1 publication Critical patent/WO2024060228A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of communication technology, and in particular to a method, device, system and storage medium for obtaining data.
  • a distributed workflow includes multiple basic execution units that can be executed using multiple nodes. There is a dependency relationship between the multiple basic units, and the multiple nodes execute the multiple nodes based on the dependency relationship. For example, assume that the distributed workflow includes a first basic execution unit and a second basic execution unit, the second basic execution unit is the predecessor dependent unit of the first basic execution unit, the first node is used to execute the first basic execution unit, and the second basic execution unit The node is used to execute the second basic execution unit. Before executing the first basic execution unit, the first node needs to obtain the operating data generated when the second basic execution unit is executed, and then execute the first basic execution unit based on the operating data.
  • the process of the first node obtaining the running data is as follows: the memory of the second node stores the running data generated when the second basic execution unit is executed, the second node reads the running data from the memory and saves the running data to a file included in the file system, and the first node reads the file from the file system and obtains the running data from the file.
  • the distributed workflow can be an application program, and the first basic execution unit and the second basic execution unit are two functions of the application program.
  • the second basic execution unit is the application's function def A():return "hello”
  • the first basic execution unit is the application's function def B(input): print(input)#hello.
  • the memory of the second node includes the running data "hello” generated when the second basic execution unit is executed, and the running data "hello” is saved to a file included in the file system.
  • the first node reads the file from the file system, obtains the running data "hello” from the file, and executes the first basic execution unit based on the running data "hello".
  • the second node reads the running data from the memory and saves it to a file, which will generate a lot of memory copy overhead.
  • the first node reads the file from the file system, which will generate a long time overhead.
  • This application provides a method, device, system and storage medium for obtaining data to save memory copy overhead and time overhead.
  • the technical solutions are as follows:
  • this application provides a method for obtaining data.
  • a first node receives a first identifier and a node identifier of a second node.
  • the first node is used to execute the first basic execution unit
  • the second node The node is used to execute the second basic execution unit.
  • the second basic execution unit is the predecessor dependent unit of the first basic execution unit.
  • the memory of the second node is used to save the operation data of the second basic execution unit.
  • the first identifier is used to indicate the Operating data.
  • the first node obtains the storage address of the running data based on the first identifier and the node identifier. Based on the storage address, the first node reads the operating data stored in the memory of the second node.
  • the first node Since the first node receives the first identification and the node identification of the second node and the first identification indicates the operating data stored in the memory of the second node, the first node can obtain the data based on the first identification and the node identification of the second node.
  • the storage address of the running data Based on the storage address, the running data saved in the memory of the second node is directly read. Since the running data saved in the memory of the second node is directly read based on the storage address, there is no need to copy the running data saved in the memory of the second node to the file system, eliminating the cost of memory copying.
  • the first node directly reads the running data saved in the memory of the second node at a higher rate, saving time and overhead.
  • the memory of the second node includes at least one first memory page, the at least one first memory page is used to save the running data, and the first identifier is used to indicate the at least one first memory page. Since the at least one first memory page is used to save the running data, the first identifier is used to indicate the running data.
  • the first node obtains the address of at least one first memory page based on the first identifier and the node identifier.
  • the first node allocates at least one second memory page to the first basic execution unit.
  • the first node includes at least one second memory page, and the at least one second memory page corresponds one-to-one with the at least one first memory page.
  • the first node obtains the storage address, which is the address of the first memory page corresponding to the second memory page read by the first basic execution unit for the first time.
  • the first basic execution unit reads the second memory page for the first time, since the data in the second memory page is stored in the first memory page corresponding to the second memory page, the address of the first memory page is obtained, so based on The address of the first memory page directly reads the operating data saved in the first memory page, thereby realizing data reading on demand and saving network resources.
  • the first node reads the operating data stored in the acquired first memory page based on the address of the acquired first memory page, thereby realizing on-demand data reading and saving network resources.
  • the memory of the second node also includes a target area, the target area corresponds to the first identifier, and the target area stores the address of at least one first memory page.
  • the first node sends a get request to the second node based on the node identifier, and the get request includes the first identifier.
  • the first node receives a get response sent by the second node, and the get response includes the address and size of the target area corresponding to the first identifier.
  • the first node reads the address of the at least one first memory page stored in the target area based on the address and size of the target area.
  • the first node can obtain the address and size of the target area. Based on the address and size of the target area, reading the address of the at least one first memory page saved in the target area can improve the read performance. Gain efficiency and reduce reading time.
  • the second node includes at least one first page table entry, the at least one first page table entry corresponds to at least one first memory page, and the first node includes at least one second page table entry.
  • the first node allocates at least one second page table entry to the first basic execution unit.
  • the at least one second page table entry corresponds to the at least one first page table entry.
  • the at least one second memory page includes each of the first page table entries. Memory pages corresponding to the two page table entries, such that the at least one first memory page corresponds to the at least one second memory page.
  • the target area stores the central processing unit CPU running status of the second node
  • the first node reads the CPU running status of the second node saved in the target area.
  • the first node sets the CPU running state of the first node to the CPU running state of the second node. In this way, the CPU operating state of the second node is restored on the first node without initializing the CPU operating state of the first node, thereby improving the efficiency of the first node in executing the first basic execution unit.
  • the CPU running status of the second node includes the status of at least one first CPU register
  • the at least one first CPU register is a second register used by the execution unit in the CPU of the second node
  • the CPU running state of the first node includes the state of at least one second CPU register.
  • the at least one first CPU register corresponds to the at least one second CPU register.
  • the at least one second CPU register is the first execution-based function in the CPU of the first node. Register used by the unit.
  • the first node reads the state of at least one first CPU register saved in the target area.
  • the first node sets the state of each second CPU register to the state of the first CPU register corresponding to each second CPU register. In this way, the first node sets the CPU running state of the first node to the CPU running state of the second node, and restores the CPU running state of the second node from the first node.
  • the target area also stores a mapping relationship between the address of the third memory page and the storage location of the first data.
  • the memory of the second node also includes the third memory page, the first data Stored in the disk of the second node, the first data is data that the second basic execution unit needs to write to the third memory page but has not yet been written to the third memory page.
  • the first node includes data corresponding to the third memory page. The corresponding fourth memory page.
  • the first node reads the mapping relationship saved in the target area.
  • the first basic execution unit reads the fourth memory page for the first time
  • the first node obtains the storage location of the first data based on the address of the third memory page corresponding to the fourth memory page and the mapping relationship.
  • the first node obtains the first data based on the storage location and the node identifier, and the first data is data used by the first node to execute the first basic execution unit. This ensures that the first node can successfully run the first basic execution unit.
  • the target area also stores a file descriptor of at least one file opened by the second basic execution unit during execution, and the second node includes the at least one file.
  • the first node reads the file descriptor of the at least one file stored in the target area, and the file descriptor of the at least one file is data used by the first node to execute the first basic execution unit. This ensures that the first node can successfully run the first basic execution unit.
  • the first basic execution unit and the second basic execution unit are two functions in the distributed workflow.
  • this application provides a method for obtaining data.
  • a second node obtains a first identifier
  • the second node is used to execute a second based execution unit
  • the first identifier is used to indicate that the second node
  • the operating data of the second basic execution unit is stored in the memory.
  • the second node sends a first identifier.
  • the first identifier is used to trigger the first node to obtain the storage address of the running data and read the running data stored in the memory of the second node based on the storage address.
  • the first node is used to execute the first Basic execution unit, the second basic execution unit is the predecessor dependent unit of the first basic execution unit.
  • the second node Since the first identification obtained by the second node indicates the operating data stored in the memory of the second node, the second node sends the first identification, so that the first node can obtain the storage address of the operating data based on the first identification. Based on the storage address Directly read the running data saved in the memory of the second node. Since the first node directly reads the operating data stored in the memory of the second node based on the storage address, the second node does not need to copy the operating data stored in the memory of the second node to the file system, eliminating the cost of memory copying. The first node directly reads the running data saved in the memory of the second node at a higher rate, saving time and overhead.
  • the memory of the second node includes at least one first memory page, the at least one first memory page is used to save the running data, and the first identifier is used to indicate the at least one first memory page. Since the at least one first memory page is used to save the running data, the first identifier is used to indicate the running data.
  • the memory of the second node includes a target area, and the target area corresponds to the first identification.
  • the second node saves the address of the at least one first memory page to the target area.
  • the second node receives the acquisition request sent by the first node, and the acquisition request includes the first identification.
  • the second node sends an acquisition response to the first node.
  • the acquisition response includes the address and size of the target area corresponding to the first identifier.
  • the acquisition response is used to trigger the first node to read the target based on the address and size of the target area.
  • the address of the at least one first memory page stored in the area. In this way, by obtaining the request and obtaining the response, the first node can obtain the address and size of the target area. Based on the address and size of the target area, reading the address of the at least one first memory page saved in the target area can improve the read performance. Gain efficiency and reduce reading time.
  • the second node saves the CPU running status of the second node to the target area, so that the first node reads the CPU running status saved in the target area.
  • the first node restores the CPU operating state on the first node based on the read CPU operating state without initializing the CPU operating state of the first node, thereby improving the efficiency of the first node in executing the first basic execution unit.
  • the CPU of the second node includes at least one first central processor CPU register used by the second execution unit, and the CPU running state includes the state of the at least one first CPU register.
  • the second node saves the mapping relationship between the address of the third memory page and the storage location of the first data to the target area.
  • the memory of the second node also includes the third memory page.
  • One data is stored in the disk of the second node.
  • the first data is the data that the second basic execution unit needs to write to the third memory page but has not yet been written to the third memory page, so that the first node can read the target area.
  • the saved mapping relationship The first node obtains the first data based on the mapping relationship, and the first data is used to execute the first basic execution unit, thus ensuring that the first node can successfully run the first basic execution unit.
  • the second node saves to the target area a file descriptor of at least one file opened by the second basic execution unit during execution, and the second node includes the at least one file, so that the first The node reads the file descriptor of the at least one file saved in the target area.
  • the file descriptor of the at least one file is used to execute the first basic execution unit, thus ensuring that the first node can successfully run the first basic execution unit.
  • the first basic execution unit and the second basic execution unit are two functions in the distributed workflow.
  • this application provides a method for obtaining data.
  • the scheduling node receives the first identifier sent by the second node, and the second node is used to execute the second execution unit based on the memory of the second node. It is used to save the operation data of the second basic execution unit, and the first identifier is used to indicate the operation data.
  • the scheduling node sends the first identification and the node identification of the second node to the first node.
  • the first node is used to execute the first basic execution unit.
  • the second basic execution unit is the predecessor dependent unit of the first basic execution unit.
  • the first identification and The node identification is used to trigger the first node to obtain the storage address of the running data, and to read the running data stored in the memory of the second node based on the storage address.
  • the scheduling node sends the first identification and the node identification of the second node to the first node. Therefore, the first node can, based on the first identification and the node identification of the second node, The storage address of the running data is obtained, and the running data saved in the memory of the second node is directly read based on the storage address. Since the operating data stored in the memory of the second node is directly read based on the storage address, there is no need to copy the operating data stored in the memory of the second node to the file system, eliminating the cost of memory copying. The first node directly reads the running data saved in the memory of the second node at a higher rate, saving time and overhead.
  • the memory of the second node includes at least one first memory page, the at least one first memory page is used to save the running data, and the first identifier is used to indicate at least one first memory page. Since the at least one first memory page is used to save the running data, the first identifier is used to indicate the running data.
  • the first basic execution unit and the second basic execution unit are two functions in the distributed workflow.
  • this application provides a device for obtaining data, used to perform the method in the first aspect or any possible implementation of the first aspect.
  • the apparatus includes a unit for performing the method in the first aspect or any possible implementation of the first aspect.
  • the present application provides a device for acquiring data, which is used to execute the method in the second aspect or any possible implementation of the second aspect.
  • the device includes a unit for executing the method in the second aspect or any possible implementation of the second aspect.
  • this application provides a device for obtaining data, used to perform the method in the third aspect or any possible implementation of the third aspect.
  • the device includes a unit for performing the method in the third aspect or any possible implementation of the third aspect.
  • the present application provides a first node, including at least one processor and a memory, the at least one processor being coupled to the memory, reading and executing instructions in the memory, to implement the first aspect or A method in any possible implementation of the first aspect.
  • the present application provides a second node, comprising at least one processor and a memory, wherein the at least one processor is used to couple with the memory, read and execute instructions in the memory, so as to implement the method in the second aspect or any possible implementation manner of the second aspect.
  • the present application provides a scheduling node, including at least one processor and a memory.
  • the at least one processor is configured to be coupled with the memory, read and execute instructions in the memory, to implement the third aspect or the third aspect. Any possible implementation method in the three aspects.
  • the present application provides a computer program product.
  • the computer program product includes a computer program stored in a computer-readable storage medium, and the computing program is loaded by a processor to implement the first aspect and the third aspect.
  • the present application provides a computer-readable storage medium for storing a computer program, which is loaded by a processor to execute the above-mentioned first aspect, second aspect, third aspect, and first aspect. Any possible implementation, any possible implementation of the second aspect, or any possible implementation of the third aspect.
  • the present application provides a chip, including a memory and a processor.
  • the memory is used to store computer instructions
  • the processor is used to call and run the computer instructions from the memory to execute the above-mentioned first aspect, second aspect,
  • the present application provides a system for obtaining data.
  • the system includes the device described in the fourth aspect and the device described in the fifth aspect, or the system includes the first method described in the seventh aspect. node and the second node described in the eighth aspect.
  • system further includes the device described in the sixth aspect or the scheduling node described in the ninth aspect.
  • Figure 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of a computing node provided by an embodiment of the present application.
  • Figure 3 is a flow chart of a method for obtaining data provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a directed acyclic graph provided by an embodiment of the present application.
  • Figure 5 is a flow chart for saving data to the first target area provided by an embodiment of the present application.
  • FIG6 is a flow chart of obtaining a storage address provided by an embodiment of the present application.
  • Figure 7 is a flow chart for reading operating data stored in the memory of the second node provided by an embodiment of the present application.
  • Figure 8 is a relationship diagram between data volume and transmission delay provided by an embodiment of the present application.
  • Figure 9 is a relationship diagram between the number of distributed workflows and throughput provided by the embodiment of the present application.
  • Figure 10 is a relationship diagram between throughput and delay provided by an embodiment of the present application.
  • Figure 11 is a relationship diagram between the number and delay of a distributed workflow provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a device for acquiring data provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of another device for obtaining data provided by an embodiment of the present application.
  • FIG14 is a schematic diagram of the structure of another device for acquiring data provided in an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a system for obtaining data provided by an embodiment of the present application.
  • the distributed workflow includes multiple basic execution units.
  • the user provides the basic execution unit of the distributed workflow to the scheduling node without deploying the computing resources required for each basic execution unit of the distributed workflow.
  • the scheduling node calls the basic execution unit in the distributed workflow. Each time a basic execution unit is called, at least one computing node is automatically started to execute the basic execution unit. Since computing nodes are started on demand, computing resource utilization can be better improved.
  • a distributed workflow may be an application.
  • the basic execution unit of the application is a function.
  • the user provides the application to the scheduling node.
  • the scheduling node calls a function of the application, it starts a Compute nodes to execute this function.
  • the application program includes methods.
  • the basic execution unit of the application program is the method in the application program.
  • the scheduling node calls a method of the application program, it starts a computing node to execute the method.
  • the computing node may be a computing device or a virtual instance running on the computing device.
  • the virtual instance can be a virtual machine or container, etc.
  • the scheduling node calls the basic execution units of the distributed workflow based on the dependencies between the multiple basic execution units.
  • the two basic execution units are respectively referred to as the first basic execution unit and the second basic execution unit.
  • the so-called dependency relationship between the first basic execution unit and the second basic execution unit means that the first basic execution unit needs the operation data generated by the second basic execution unit when executing.
  • the running data is the execution result of the second basic execution unit, and/or the second basic execution unit generates part or all of the intermediate results during execution, etc.
  • the first basic execution unit may be referred to as a successor-dependent unit of the second basic execution unit, or the second basic execution unit may be referred to as a predecessor-dependent unit of the first basic execution unit.
  • the predecessor dependent unit is a unit that generates operating data required when the first basic execution unit is executed.
  • the execution of the above-mentioned first basic execution unit requires the running data generated when the second basic execution unit is executed, that is, the first basic execution unit is executed based on the running data, so the second basic execution unit is called the precursor dependency of the first basic execution unit unit.
  • the computing node that executes the first basic execution unit is called the first node
  • the computing node that executes the second basic execution unit is called the second node.
  • the scheduling node first calls the second basic execution unit and starts the second node to execute the second basic execution unit.
  • the target operation data is the operation data required to execute the first basic execution unit
  • the scheduling node calls the first basic execution unit, and start the first node.
  • the first node obtains the target operation data and executes the first basic execution unit based on the target operation data.
  • the second basic execution unit included in the distributed workflow is: def A():return "hello”
  • the first basic execution unit is: def B(input): print(input)#hello.
  • the target operation data "hello” is the target operation data generated when the second basic execution unit is executed
  • the target operation data "hello” is the operation data required to execute the first basic execution unit. Therefore, the second basic execution unit can be called the predecessor dependent unit of the first basic execution unit, or the first basic execution unit can be called the successor dependency node of the second basic execution unit.
  • the scheduling node After receiving the distributed workflow, the scheduling node first calls the second basic execution unit, starts the first node, and uses the first node to execute the second basic execution unit. After the second node executes the second basic execution unit, the second node includes the target operation data "hello" generated by the second basic execution unit.
  • the scheduling node calls the follow-up dependent unit of the second basic execution unit, that is, calls the first basic execution unit, and starts the first node.
  • the first node obtains the target operation data "hello” and executes the first basic execution unit based on the target operation data "hello".
  • At least one memory page is allocated to the second basic execution unit from the memory of the second node.
  • the second basic execution unit accesses the at least one memory page, that is, the second basic execution unit writes data to the at least one memory page and/or reads data from the at least one memory page.
  • One memory page reads data, and the data stored in at least one memory page includes operating data of the second basic execution unit.
  • the target operation data is saved in the at least one memory page.
  • the second node can share the storage address of the target operating data in the memory of the second node, and the first node directly reads the second node based on the storage address. Some or all of the target operating data is saved in memory.
  • the storage address includes the address of the at least one memory page.
  • the first node can directly read part or all of the memory page based on the address of the at least one memory page. operating data. In this way, the first node reads part or all of the target operating data from the memory of the second node.
  • the first node may use remote direct memory access (RDMA) technology to directly read the running data stored in some or all of the memory pages in the second node.
  • RDMA remote direct memory access
  • the first node directly reads part or all of the target running data stored in the memory of the second node, this saves the second node from saving the target running data to the file system.
  • the network architecture 100 includes a scheduling node 101 and a computing node 102.
  • the number of computing nodes 102 included in the network architecture 100 may be multiple.
  • Each computing node 102 is able to communicate with the scheduling node 101.
  • the scheduling node 101 is used to obtain a distributed workflow, which includes multiple basic execution units. Based on the dependencies between the basic execution units included in the distributed workflow, the basic execution units included in the distributed workflow are called. For convenience of explanation, the called basic execution unit is called the first basic execution unit. At least one computing node 102 is scheduled to send the first basic execution unit to the at least one computing node 102 .
  • the first basic execution unit is the first basic execution unit of the distributed workflow.
  • the first basic execution unit does not have a predecessor dependent unit, but may have a successor dependent unit.
  • the first basic execution unit is not the first basic execution unit of the distributed workflow, nor is it the last basic execution unit of the distributed work.
  • the first basic execution unit may have a predecessor dependent unit or a successor dependent unit. .
  • the first basic execution unit is the last basic execution unit of the distributed workflow.
  • the first basic execution unit may have predecessor dependent units but no successor dependent units.
  • the scheduling node 101 sends the first basic execution unit to the at least one computing node 102. If the first basic execution unit has a predecessor dependent unit and the predecessor dependent unit has generated the first target operating data, for the convenience of explanation, the predecessor dependent unit is called a second basic execution unit, and the scheduling node 101 sends a request to the at least one computing node 102 .
  • the first basic execution unit, the first identification and the node identification of the second node are sent.
  • the first target operation data is the operation data required to execute the first basic execution unit.
  • the first target operation data is the operation data generated by the second basic execution unit and the first target operation data is stored in the memory of the second node.
  • the second The node is the computing node 102 running the predecessor dependent unit, and the first identifier is used to indicate the first target operating data stored in the memory of the second node.
  • the computing node 102 is called a first node.
  • the first node is used to receive the first basic execution unit and execute the first basic execution unit.
  • the first node receives the first basic execution unit and executes the first basic execution unit. If the first basic execution unit has a predecessor dependent unit and the predecessor dependent unit has generated the first target operating data, the first node receives the first basic execution unit, the first identifier and the node identifier of the second node, based on the first identifier and the second node identifier. The node identifier of the second node reads part or all of the first target operation data stored in the memory of the second node, and executes the first basic execution unit based on the read part or all of the first target operation data.
  • the first node is also configured to send a second identification to the scheduling node 101 when the first basic execution unit generates the second target operating data.
  • the second target operating data is the operating data required to execute the rear-drive dependent unit and is stored in the memory of the first node.
  • the second identifier is used to indicate the second target operating data stored in the memory of the first node.
  • the scheduling node 101 is also configured to receive the second identification sent by the first node, and if there are unexecuted basic execution units in the distributed workflow, call the unexecuted basic execution units.
  • the called basic execution unit is called the third basic execution unit. If the third basic execution unit has a predecessor dependent unit and the predecessor dependent unit is the first basic execution unit, at least one computing node 102 is scheduled to The at least one computing node 102 sends the third basic execution unit, the second identification and the node identification of the first node. If there are no unexecuted basic execution units in the distributed workflow, the operation ends.
  • the computing node 102 performs the same operation as that performed by the first node, which will not be described in detail here.
  • the computing node 102 includes a central processing unit (CPU) 1021, a page table 1022 and a memory 1023.
  • the CPU 1021 includes a CPU register.
  • the page table 1022 includes a plurality of page table entries.
  • the memory 1023 includes a plurality of memory pages.
  • the plurality of page table entries correspond to the plurality of memory pages.
  • Each page table entry includes each page table. The address of the memory page corresponding to the item.
  • the page table entry includes the virtual address and physical address of the memory page corresponding to the page table entry.
  • page table 1022 includes page table entry 1, page table entry 2, page table entry 3, etc.
  • memory 1023 includes memory page 1, memory page 2, memory page 3, etc.
  • Page table entry 1 corresponds to memory page 1
  • page table entry 1 includes the virtual address and physical address of memory page 1.
  • Page table entry 2 corresponds to memory page 2
  • page table entry 2 includes the virtual address and physical address of memory page 2.
  • Page table entry 3 corresponds to memory page 3 and page table entry 3 includes the virtual address and physical address of memory page 3.
  • the computing node 102 After receiving the basic execution unit sent by the scheduling node 101, the computing node 102 allocates at least one CPU register to the basic execution unit from the CPU register of the CPU 1021, and allocates at least one page table entry to the basic execution unit from the page table 1022. .
  • the at least one page table entry corresponds to at least one memory page in the memory 1023. That is to say, allocating at least one page table entry to the basic execution unit can be regarded as: allocating the at least one memory page to the basic execution unit.
  • the CPU 1021 of the computing node 102 is used to execute the basic execution unit.
  • the basic execution unit obtains the virtual address of the memory page that needs to be read and written.
  • the at least one memory page includes the memory page, and obtains the page table entry including the virtual address from the page table 1022. Based on The page table entry includes the physical address of the memory page, reads data stored in the memory page and/or writes data to the memory page.
  • the data stored in the at least one memory page is the running data of the basic execution unit during execution.
  • the at least one CPU register is used to save the operating state of the CPU 101 of the computing node 102.
  • the operating state of the CPU 101 includes the state of the at least one CPU register, and when the operating state of the CPU 101 changes, the basic execution unit modifies the state of the at least one CPU register.
  • the multiple computing nodes 102 in the network architecture 100 may include computing devices and/or virtual instances, etc.
  • the virtual instances include containers and/or virtual machines, etc.
  • an embodiment of the present application provides a method 300 for obtaining data.
  • the method 300 is applied to the network architecture 100 shown in Figure 1.
  • the method 300 includes the following processes from steps 301 to 310.
  • Step 301 The scheduling node receives the distributed workflow and calls the first basic execution unit of the distributed workflow.
  • the distributed workflow is an application
  • the application includes functions Func0, Func1, Func2, Func3, Func4,.... Func0, Func1, Func2, Func3, and Func4 are all basic execution units of the application.
  • the scheduling node receives the application, calls the first basic execution unit Func0 of the application, and schedules at least one computing node for the first basic execution unit Func0. .
  • the scheduling node may also convert the distributed workflow into a directed acyclic graph.
  • the directed acyclic graph is a tree representation of the distributed workflow.
  • Each node in the effective acyclic graph is a distributed The basic execution unit of workflow.
  • one of the basic execution units is the predecessor dependent unit of the other basic execution unit.
  • the two basic execution units are connected by an edge, and the one basic execution unit is the parent node of the other basic execution unit.
  • the root node of the directed acyclic graph is the first basic execution unit of the distributed workflow.
  • the distributed workflow is an application, and the application includes basic execution units Func0, Func1, Func2, Func3, Func4, ....
  • the distributed workflow is converted into a directed acyclic graph as shown in Figure 5.
  • Func0 is the first basic execution unit of the distributed workflow and is the root node of the directed acyclic graph.
  • Func0 is the precursor dependent unit of Func1 and Func2, and Func0 is the parent node of Func1 and Func2.
  • Func1 is the precursor dependent unit of Func3, and Func1 is the parent node of Func3.
  • Func2 is the precursor dependent unit of Func4, and Func2 is the parent node of Func4.
  • Step 302 The scheduling node sends the first basic execution unit to the second node.
  • the second node is a computing node scheduled by the scheduling node.
  • the scheduling node may schedule at least one computing node such that the scheduling node sends the first basic execution unit to each scheduled computing node.
  • the scheduling node After the scheduling node sends the first basic execution unit to the second node, it waits for the second node to obtain the first target operation data during the execution of the first basic execution unit.
  • the first target operation data is the operation data generated when the first basic execution unit is executed, and the first target operation data is the data required by the subsequent dependent unit that executes the first basic execution unit.
  • each computing node performs the following operations from steps 303 to 305 in the same manner as the second node.
  • Step 303 The second node receives the first basic execution unit and executes the first basic execution unit.
  • step 303 the second node works in the user mode, receives the first basic execution unit in the user mode, and executes the first basic execution unit.
  • the CPU of the second node includes one or more CPU registers.
  • the second node also includes a page table including one or more page table entries.
  • step 303 after the second node receives the first basic execution unit, it allocates at least one first CPU register to the first basic execution unit from the CPU register included in the CPU of the second node, and from the CPU register included in the second node
  • the first basic execution unit is allocated at least one page table entry in the page table.
  • the at least one page table entry corresponds to at least one memory page in the memory of the second node, and each page table entry respectively includes the address of the memory page corresponding to each page table entry.
  • Initialize an initial state of the at least one first CPU register so that the first node can execute the first basic execution unit using the at least one first CPU register and the at least one page table entry.
  • the page table entry includes the virtual address and physical address of the memory page corresponding to the page table entry. That is, the address of the memory page includes the virtual address and physical address of the memory page.
  • the at least one page table entry is a page table entry used by the first basic execution unit, and the at least one memory page corresponding to the at least one page table entry is a memory page used by the first basic execution unit.
  • the page table of the second node includes a plurality of page table entries, and each page table entry in the page table has a sequence number.
  • the serial number of each page table entry in the page table may be numbered starting from 0 based on the order of each page table entry in the page table. For example, the first page table entry in the page table has a sequence number of 0, the second page table entry has a sequence number of 1, the third page table entry has a sequence number of 2,...
  • the CPU running status of the second node includes the status of the at least one first CPU register.
  • Initializing the initial state of the at least one first CPU register may be regarded as initializing the initial operating state of the CPU of the second node. That is to say, the initial operating state of the CPU of the second node includes the initial state of the at least one first CPU register.
  • the first basic execution unit may modify part or all of the first CPU in the at least one first CPU register. The status of the register.
  • the process of executing the first basic execution unit on the second node Based on the at least one page table entry, the first basic execution unit reads and writes at least one memory page corresponding to the at least one page table entry.
  • the at least one memory page Part or all of the memory page is used to save the running data generated by the first basic execution unit.
  • the memory page is the memory page corresponding to a certain page table entry in the at least one page table entry.
  • the first basic execution unit obtains the virtual address of the memory page, and obtains the page table entry including the virtual address from the page table of the second node.
  • the obtained page table entry is the page table entry corresponding to the memory page.
  • the operating data of the first basic execution unit includes the read data and/or the written data.
  • Step 304 When the first basic execution unit generates the first target operation data, the second node obtains the first identification.
  • the first identification is used to indicate the first target operation data.
  • the first target operation data is to execute the first basic execution.
  • the unit's backend depends on the operating data required by the unit.
  • the first target operating data may be the operating result generated by the first basic execution unit after the second node executes the first basic execution unit. After the second node executes the first basic execution unit, the second node can determine the first basic execution unit. A basic execution unit generates first target operating data and obtains a first identifier indicating the first target operating data. Alternatively, the first target operating data may be intermediate data generated by the first basic execution unit during the execution of the first basic execution unit by the second node. After the second node completes execution of the first basic execution unit, It can be determined that the first basic execution unit generates the first target operation data, and a first identification indicating the first target operation data is obtained.
  • the at least one memory page includes at least one first memory page, and the at least one first memory page stores the first target operating data.
  • the at least one page table entry includes at least one first page table entry, and the at least one first page table entry corresponds to the at least one first memory page.
  • the storage address of the first target running data in the memory of the second node includes the address of the at least one first memory page.
  • the above-mentioned at least one memory page also includes a third memory page, the third memory page corresponds to the first data, the first data is data stored in the disk of the second node, and the first data is the first data.
  • a basic execution unit needs to write the data of the third memory page but has not yet written the data of the third memory page.
  • the at least one page table entry also includes a third page table entry, and the third page table entry corresponds to the third memory page.
  • the second node includes a mapping relationship between the address of the third memory page in the third page table entry and the storage location of the first data.
  • the storage location of the first data is the location of the first data in the disk of the second node.
  • the first data is what the first basic execution unit needs to read from the disk of the second node and write to the third memory page. data, but the first basic execution unit has not yet read the first data from the disk of the second node and written it into the third memory page.
  • the second node will record the mapping relationship between the address of the third memory page (the virtual address and/or physical address of the third memory page) and the storage location of the first data.
  • the first data is data saved in the first file on the disk of the second node.
  • the storage location of the first data may include the file identifier of the first file and the offset of the first data in the first file. and size etc.
  • the first basic execution unit may also open at least one file and perform execution based on data in the at least one file.
  • the at least one file may be a file saved on the disk of the second node.
  • the second node records the file descriptor of at least one file opened by the first basic execution unit. For any file, the file descriptor is used to identify the file.
  • the second node enters the kernel state from the user state when the first basic execution unit generates the first target running data.
  • a continuous node is allocated in the memory of the second node. the first target area, obtain the at least one first page table entry, and save the at least one first page table entry into the first target area.
  • the second node generates a first identification corresponding to the first target area.
  • the first identifier may include the address and size of the first target area, or the first identifier is an identity number (identity, ID) generated by the second node for identifying the first target area.
  • saving the at least one first page table entry into the first target area can be considered as: saving the address of at least one first memory page corresponding to the at least one first page table entry into the first target area.
  • the at least one first page table entry corresponds to the at least one first memory page
  • the at least one first memory page is used to save the first target operating data
  • the first identifier corresponds to the first target area, so The first identifier is implemented to indicate the first target operating data.
  • the second node saves the mapping relationship between the first identifier, the address of the first target area, and the size of the first target area to realize the correlation between the first identifier and the first target area. correspond.
  • the second node may also generate first verification information corresponding to the first identification, that is, the first identification corresponds to the first target area and the first verification information.
  • the second node stores a mapping relationship between the first identification, the first verification information, the address of the first target area, and the size of the first target area, so as to realize the mapping between the first identification, the first target area, and the first target area. Corresponds to the verification information.
  • the second node uses a specified data structure to save the mapping relationship between the first identifier, the address of the first target area, and the size of the first target area, or uses a specified data structure to save the first target area.
  • the specified data structure includes a hashmap (hashmap), etc.
  • the second node may also obtain the mapping relationship between the address of the third memory page and the storage location of the first data, and obtain the third page table entry corresponding to the third memory page, Save the mapping relationship between the address of the third memory page and the storage location of the first data, and the third page table entry into the first target area.
  • saving the third page table entry into the first target area can be considered as: saving the address of the third memory page into the first target area.
  • the second node may also obtain the CPU running status of the second node and save the CPU running status of the second node to the first target area.
  • the second node obtains the state of at least one first CPU register used by the first basic execution unit, the CPU running state of the second node includes the state of the at least one first CPU register, and saves the state of the at least one first CPU register into the first target area.
  • the second node may also obtain the file descriptor of at least one file opened by the first basic execution unit during execution, and save the file descriptor of the at least one file to the first target area. middle.
  • Step 305 The second node sends the first information to the scheduling node, where the first information includes the first identifier.
  • the first information may also include the first verification information.
  • the second node also allocates a dynamically connected transport (DCT) object, which is used to establish a connection with a node other than the second node.
  • DCT dynamically connected transport
  • Step 306 The scheduling node receives the first information and calls the i-th basic execution unit.
  • the first basic execution unit and the i-th basic execution unit are the two basic execution units in the distributed workflow.
  • the scheduling node receives the first information and calls the back-driving dependency unit of the first basic execution unit from the distributed workflow.
  • the back-driving dependency unit is the i-th basic execution unit after the first basic execution unit. execution unit.
  • the scheduling node determines whether the distributed workflow has an unexecuted basic execution unit. If the distributed workflow does not have an unexecuted basic execution unit, the operation ends. If the distributed workflow has an unexecuted basic execution unit, the scheduling node obtains the child node of the first basic execution unit from the directed acyclic graph corresponding to the distributed workflow. If the obtained child node is a basic execution unit that has not been executed, the obtained child node is used as the back-driven dependent unit of the first basic execution unit, that is, the i-th basic execution unit is obtained.
  • the first basic execution unit of the distributed workflow is Func0, and the scheduling node obtains the first basic execution from the directed acyclic graph.
  • Two child nodes of unit Func0 which are the second basic execution unit Func1 and the third basic execution unit Func2 of the distributed workflow.
  • the second basic execution unit Func1 and the third basic execution unit Func2 are both back-driven dependent units of the first basic execution unit Func0.
  • the i-th basic execution unit may be the second basic execution unit Func1, or it may be the third basic execution unit Func2.
  • Step 307 The scheduling node sends the node identifier of the second node, the first information and the i-th basic execution unit to the first node.
  • the first node is a computing node scheduled by the scheduling node.
  • the scheduling node may schedule at least computing nodes, so that the scheduling node sends the i-th basic execution unit to each scheduled computing node.
  • the scheduling node After the scheduling node sends the node identification of the second node, the first information and the i-th basic execution unit to the first node, it waits for the first node to obtain the second target operation data during the execution of the i-th basic execution unit.
  • the second target operation data is the operation data generated when the i-th basic execution unit is executed, and the second target operation data is the data required to execute the back-driven dependent unit of the i-th basic execution unit.
  • each computing node performs the following operations from steps 308 to 310 in the same manner as the first node.
  • Step 308 The first node receives the node identification, the first information and the i-th basic execution unit of the second node, and obtains the storage address of the first target operating data based on the first identification and the node identification.
  • the first target operation data is stored in the memory of the second node, and the storage address of the first target operation data is the address of the first target operation data in the memory of the second node.
  • the memory of the second node includes at least one first memory page, and the at least one first memory page stores the first target operating data.
  • the address of the first target running data includes the address of the at least one first memory page.
  • step 308 the first node obtains the address of the at least one first memory page through the following process.
  • the first node sends a first acquisition request to the second node based on the node identifier of the second node, where the first acquisition request includes the first identifier.
  • the first acquisition request further includes the first verification information.
  • the first acquisition request is a first remote procedure call (RPC) request, that is, the first node sends the first RPC request to the second node based on the node identifier of the second node.
  • RPC remote procedure call
  • the request includes the first identifier, or the first RPC request includes the first identifier and first verification information.
  • the first node after receiving the node identifier, the first information and the i-th basic execution unit of the second node, the first node first enters the kernel state from the user mode. After the first node enters the kernel state, the first node sends a first acquisition request to the second node based on the node identifier of the second node.
  • the second node receives the first acquisition request and sends a first acquisition response to the first node.
  • the first acquisition response includes the address and size of the first target area corresponding to the first identification.
  • the first acquisition request includes a first identifier
  • the second node includes a mapping relationship between the first identifier, an address of the first target area, and a size of the first target area.
  • the second node receives the first acquisition request, obtains the address and size of the first target area corresponding to the first identifier from the mapping relationship based on the first identifier included in the first acquisition request, and sends the first acquisition request to the first node.
  • the first acquisition response includes the address and size of the first target area corresponding to the first identification.
  • the first acquisition request includes a first identification and first verification information
  • the second node includes a first identification, a first verification information, an address of the first target area, and a size of the first target area. mapping relationship.
  • the second node receives the first acquisition request and, based on the first identifier included in the first acquisition request, acquires the first verification information corresponding to the first identifier and the address and size of the first target area from the mapping relationship. If the first verification information included in the first acquisition request is the same as the obtained first verification information, a first acquisition response is sent to the first node, where the first acquisition response includes the first target area corresponding to the first identification. address and size.
  • the first acquisition response is a first RPC response, that is, the second node sends a first RPC response to the first node, and the first RPC response includes the address and size of the first target area corresponding to the first identifier.
  • the first node receives the first acquisition response, and reads the content stored in the first target area based on the address and size of the first target area included in the first acquisition response.
  • the content includes at least one first page table entry, and reading the content includes: reading the address of at least one first memory page corresponding to the at least one first page table entry from the first target area.
  • the above 3081-3082 are optional operations.
  • the first identifier includes the address and size of the first target area
  • the first node does not need to perform the operations 3081-3082, and directly based on the first target area included in the first acquisition response The address and size of the first target area are read.
  • the first identifier is an ID
  • the first node performs operations 3081-3082.
  • the first node receives the first acquisition response, establishes an RDMA connection with the second node, and reads the first target area of the second node through the RDMA connection based on the address and size of the first target area. Saved content.
  • the second node includes a DCT object through which the second node establishes an RDMA connection with the first node.
  • the read content may also include one or more of the following: the CPU running status of the second node, the mapping relationship between the third page table entry, the address of the third memory page and the storage location of the first data, or, the third A basic execution unit opens the file descriptor of at least one file.
  • the CPU operating status of the second node includes the status of at least one first CPU register of the second node.
  • the first node after the first node reads the content, it also needs to allocate at least one second memory page from the memory of the first node to the i-th basic execution unit.
  • the at least one second memory page is consistent with the at least one
  • the first memory page has a one-to-one correspondence.
  • it can be implemented through the following operations 3084.
  • the first node allocates at least one second page table entry to the i-th basic execution unit.
  • the at least one second page table entry corresponds to the at least one first page table entry.
  • the at least one second memory page includes each The memory page corresponding to the second page table entry.
  • the first node includes a page table
  • the page table includes a plurality of page table entries
  • each page table entry in the page table has a sequence number.
  • the serial number of each page table entry in the page table may be numbered starting from 0 based on the order of each page table entry in the page table. For example, the first page table entry in the page table has a sequence number of 0, the second page table entry has a sequence number of 1, the third page table entry has a sequence number of 2,...
  • the memory of the first node includes multiple memory pages, and the multiple page table entries included in the page table correspond to the multiple memory pages included in the memory of the first node.
  • the first node refers to the sequence number of the at least one first page table entry, and allocates at least one second page table entry to the i-th basic execution unit in the page table of the first node.
  • the at least one second page table entry The item has a one-to-one correspondence with at least one first page table item.
  • the second page table entry corresponding to the first page table entry may refer to: the second page table entry.
  • the serial number of the table entry is the same as the serial number of the first page table entry.
  • the at least one second page table entry has a one-to-one correspondence with at least one second memory page in the memory of the first node. Allocating the at least one second page table entry to the i-th basic execution unit can be considered as: At least one second memory page is allocated to the i-th basic execution unit.
  • the at least one first page table entry has a one-to-one correspondence with the at least one second page table entry
  • the at least one first memory page has a one-to-one correspondence with the at least one second memory page.
  • the first node may also set the local tag of each second page table entry in the at least one second page table entry as the first tag. For the first mark of any second page table entry, the first mark is used to indicate that the i-th basic execution unit has not yet accessed the second memory page corresponding to the second page table entry.
  • the local mark of the page table entry is used to indicate whether the i-th basic execution unit has accessed the memory page corresponding to the page table entry.
  • the local mark of the page table entry may be a first mark or a second mark.
  • the second mark is used to indicate that the i-th basic execution unit has accessed the memory page corresponding to the page table entry.
  • the read content also includes a third page table entry, the first node refers to the serial number of the third page table entry, and allocates a fourth page table entry for the i-th basic execution unit in the page table of the first node, and the third page table entry corresponds to the fourth page table entry.
  • the third page table entry corresponds to the fourth page table entry, which may mean that the serial number of the third page table entry and the serial number of the fourth page table entry may be the same.
  • the fourth page table entry corresponds to the fourth memory page in the memory of the first node.
  • Allocating the fourth page table entry to the i-th basic execution unit can be considered as: allocating the fourth memory page to the i-th basic execution unit. unit.
  • the third page table entry corresponds to the fourth page table entry
  • the third memory page and the fourth memory page also correspond to each other.
  • the first node may also set the local tag of the fourth page table entry as the first tag.
  • the first mark is used to indicate that the i-th basic execution unit has not yet accessed the fourth memory page corresponding to the fourth page table entry.
  • the first mark is 0 and the second mark is 1; or the first mark is 1 and the second mark is 0.
  • the value of the first mark may also be other values, and the value of the second mark may also be other values, which will not be listed one by one here.
  • the first node may also set the remote tag of each second page table entry in the at least one second page table entry as a third tag. For the third mark of any second page table entry, the third mark is used to indicate that the data of the second memory page corresponding to the second page table entry is stored in the second node. And/or, the first node may also set the remote mark of the fourth page table entry to the third mark, and the third mark of the fourth page table entry is used to indicate the data of the fourth memory page corresponding to the fourth page table entry. Saved in the second node.
  • the third flag is equal to 0 or 1.
  • the value of the third mark may also be other values, which will not be listed one by one here.
  • the above-mentioned at least one second memory page and/or fourth memory page is a memory page allocated by the first node to be used by the i-th basic execution unit. That is to say, the memory page used by the i-th basic execution unit includes the at least one second memory page, and may also include the fourth memory page.
  • the page table entry used by the i-th basic execution unit includes the at least one second page table entry, and may also include a fourth page table entry.
  • the read content also includes the CPU running status of the second node.
  • the first node may also use the following operation 3085 to restore the CPU running status of the second node on the first node.
  • the first node also sets the CPU running status of the first node to the CPU running status of the second node.
  • the CPU operating status of the second node includes the status of at least one first CPU register.
  • the first node allocates at least one second CPU register to the i-th basic execution unit.
  • the CPU of the first node includes the at least one second CPU register.
  • At least A first CPU register has a one-to-one correspondence with at least one second CPU register, and the at least one second CPU register is a register in the first node.
  • the first node sets the state of each second CPU register to the state of the first CPU register corresponding to each second CPU register.
  • the CPU operating status of the first node includes the status of the at least one second CPU register.
  • the first node saves the mapping relationship between the address of the third memory page and the storage location of the first data.
  • the first node saves the file descriptor of the at least one file.
  • the memory page used by the i-th basic execution unit corresponds to the memory page used by the first basic execution unit.
  • the first node sets the CPU running state of the first node to the CPU running state of the second node, so that the A node does not need to initialize the memory page used by the i-th basic execution unit, nor does it need to initialize the CPU running state, that is, it does not need to perform a cold start.
  • the first node can quickly execute the i-th basic execution unit and improve execution efficiency.
  • the above 3085 is an optional operation, that is, the operation of 3085 may not be performed.
  • the first node allocates at least one second CPU register to the i-th basic execution unit and initializes the initialization of the at least one second CPU register. state.
  • the execution order between operations 3085 and 3084 is in no particular order.
  • 3084 can be executed first and then 3085, or 3085 can be executed first and then 3084, or 3084 and 3085 can be executed at the same time.
  • the first node After executing operations 3084 and 3085, the first node returns from the kernel state to the user state, runs in the user state, and executes the i-th basic execution unit.
  • Step 309 The first node reads part or all of the first target operation data stored in the memory of the second node based on the storage address of the first target operation data.
  • the first node executes the i-th basic execution unit.
  • the i-th basic execution unit may need to access the memory page used by the i-th basic execution unit.
  • Accessing a memory page includes writing a memory page and/or reading a memory page.
  • the i-th basic execution unit When the i-th basic execution unit needs to write data to a certain memory page, the i-th basic execution unit obtains the virtual address of the memory page and obtains the page table entry including the virtual address from the page table of the second node. Write data to the memory page based on the physical address of the memory page included in the page table entry.
  • the memory page may be a second memory page or a fourth memory page
  • the page table entry may be a second page table entry corresponding to the second memory page or a fourth page table entry corresponding to the fourth memory page.
  • the local mark of the page table entry is the first mark, it means that the i-th basic execution unit accesses the memory page corresponding to the page table entry for the first time.
  • the page table The item's local tag is set to the second tag.
  • the i-th basic execution unit When the i-th basic execution unit reads a certain memory page, if the memory page is the second memory page that the i-th basic execution unit reads for the first time, and the second memory page is not written to before reading, data. In this way, when the i-th basic execution unit reads the second memory page for the first time, the first node obtains the address of the first memory page corresponding to the second memory page, and the storage address of the first target operating data includes the obtained first memory page. The address of the memory page. Based on the address of the first memory page, directly read the first target operating data stored in the first memory page in the memory of the second node.
  • the first node obtains the address of the third memory page corresponding to the fourth memory page. Based on the address of the third memory page, from the address of the third memory page and The storage location of the first data is obtained from the mapping relationship between the storage locations of the first data. Based on the storage location of the first data, obtain the first data stored in the disk of the second node.
  • the first node can obtain the first target operating data saved in the first memory page through the following operations 3091-3097, or obtain the first data.
  • the first node obtains the virtual address of the memory page to be read by the i-th basic execution unit.
  • the memory page to be read may be the second memory page or the fourth memory page.
  • the memory page to be read is the i-th basic execution unit. The memory page that the unit needs to read.
  • the first node obtains the page table entry to be read corresponding to the memory page to be read based on the virtual address of the memory page to be read. If the local mark of the page table entry to be read is the first mark, perform operation 3093.
  • the local mark of the page table entry to be read is the first mark, it indicates that the memory page to be read is the memory page read for the first time by the i-th basic execution unit, and before reading the memory page to be read, the i-th basic execution unit The execution unit also does not write data to the memory page to be read, that is to say, the i-th basic execution unit accesses the memory page to be read for the first time. If the local mark of the page table entry to be read is the second mark, the first node directly reads the data in the memory page to be read.
  • the first node finds the page table entry including the virtual address of the memory page to be read from the page table of the first node, and the page table entry is the page table entry to be read. If the local mark of the page table entry to be read is the first mark, the first node enters the kernel state from the user state. When the first node is running in the kernel state, the data is read through the following operations 3093-3097.
  • the first node enters kernel mode through a page fault handling function.
  • the first node determines whether the memory page to be read is the second memory page or the fourth memory page. If the memory page to be read is the second memory page, execute 3094. If the memory page to be read is The page is the fourth memory page and 3095 is executed.
  • the first node may also determine whether the remote mark of the page table entry to be read is the third mark. If the remote mark of the page table entry to be read is the third mark, it indicates that the page to be read is the third mark. The data of the memory page to be read corresponding to the table entry is saved in the second node, and then operation 3093 is performed. If the remote mark of the page table entry to be read is not the third mark, the first node uses the page fault processing function to obtain the first target running data saved in the memory page to be read.
  • the first node obtains another page table entry corresponding to the page table entry to be read, and the other page table entry is a page table entry in the page table of the second node. If the other page table entry is the first page table entry, the memory page to be read is determined to be the second memory page. If the other page table entry is the third page table entry, the memory page to be read is determined to be the fourth memory page. Page.
  • the first node Based on the address of the first memory page corresponding to the second memory page, the first node reads the first target operating data stored in the first memory page in the memory of the second node, and executes 3097.
  • the other page table entry includes the address (physical address) of the first memory page corresponding to the second memory page.
  • the first node directly reads the first target operation data stored in the first memory page in the memory of the second node, compared with the traditional method of reading the first target operation data from the file system, the transmission delay required for reading the data is greatly reduced.
  • the larger the amount of data read the greater the difference between the transmission delay required for the traditional method to read data from the file system and the transmission delay required for the embodiment of the present application to directly read data from the memory of the second node, so the first node directly reads the first target operation data in the memory of the second node, which can reduce time overhead.
  • the first node obtains the storage location of the first data based on the address of the third memory page corresponding to the fourth memory page and the mapping relationship between the address of the third memory page and the storage location of the first data.
  • the storage location of the first data includes the file identification of the first file to which the first data belongs, the offset and size of the first data in the first file.
  • the first node obtains the first data based on the storage location of the first data and the node identifier of the second node.
  • the first node sends a second acquisition request to the second node based on the node identifier of the second node, and the second acquisition request includes the file identifier of the first file, and the offset and size of the first data in the first file.
  • the second node receives the second acquisition request, and based on the file identifier of the first file, acquires the first file stored in the disk of the second node. Based on the offset and size of the first data in the first file, the first data is acquired from the first file.
  • the second node sends a second acquisition response to the first node, and the second acquisition response includes the first data.
  • the first node receives the second acquisition response and reads the first data in the second acquisition response.
  • the second acquisition request is a second RPC request, that is, the first node sends a second RPC request to the second node based on the node identifier of the second node, and the second RPC request includes the file identifier of the first file, The offset and size of the first data in the first file.
  • the second acquisition response is a second RPC response, that is, the second node sends a second RPC response to the first node, and the second RPC response includes the first data.
  • the first node sets the local mark of the page table entry to be read as the second mark.
  • the second mark is used to indicate that the i-th basic execution unit has accessed the memory page to be read corresponding to the page table entry to be read.
  • the first node also switches from the kernel mode to the user mode.
  • the following step 310 is performed.
  • Step 310 The first node executes the i-th basic execution unit based on the part or all of the first target operation data.
  • the first node When the memory page to be read is the second memory page, the first node reads the first target operation data stored in the first memory page corresponding to the second memory page, and executes the i-th basic operation data based on the first target operation data. execution unit.
  • the first node obtains the first data and executes the i-th basic execution unit based on the first data.
  • the file descriptor of the above-mentioned at least one file may also be required. That is, the first node may also execute the i-th basic execution unit based on the file descriptor of the above-mentioned at least one file.
  • the i-th basic execution unit may generate second target operating data.
  • the second target operation data is the operation data generated when the i-th basic execution unit is executed, and the second target operation data is the data required to execute the back-driven dependent unit of the i-th basic execution unit.
  • the first node After the i-th basic execution unit generates the second target operating data, the first node sends the second information to the scheduling node, and the second information includes the second identification, or the second information includes the second identification and the second verification information, The second identifier is used to indicate the second target operating data.
  • the memory page used by the i-th basic execution unit includes at least one fifth memory page, and the at least one fifth memory page is used to save the second target operation data.
  • the at least one fifth memory page may include the above-mentioned second memory page and/or fourth memory page.
  • the page table entry used by the i-th basic execution unit includes at least one fifth page table entry, and the at least one fifth page table entry corresponds to the at least one fifth memory page.
  • the memory page used by the i-th basic execution unit may also include a sixth memory page.
  • the sixth memory page corresponds to the second data.
  • the second data is the data stored in the disk of the first node.
  • the second data is the i-th basic execution unit.
  • a basic execution unit needs to write the data of the sixth memory page but has not yet written the data of the sixth memory page.
  • the sixth memory page may be one of the above-mentioned second memory pages or fourth memory pages.
  • the page table entry used by the i-th basic execution unit may also include a sixth page table entry, and the sixth page table entry corresponds to the sixth memory page.
  • the first node includes a mapping relationship between the address of the sixth memory page in the sixth page entry and the storage location of the second data.
  • the storage location of the second data is the location of the second data in the disk of the first node.
  • the second data is what the i-th basic execution unit needs to read from the disk of the second node and write to the sixth memory page. data, but the i-th basic execution unit has not yet read the second data from the disk of the first node and written it into the sixth memory page.
  • the first node will record the mapping relationship between the address of the sixth memory page (the virtual address and/or physical address of the sixth memory page) and the storage location of the second data.
  • the first node After the i-th basic execution unit generates the second target operating data, the first node enters the kernel state. When running in the kernel state, a continuous second target area is allocated in the memory of the first node, and the at least one-th target area is obtained. Five-page table entries, and save the at least one fifth-page table entry into the second target area. The first node generates a second identification corresponding to the second target area.
  • the first node stores a mapping relationship between the second identifier, the address of the second target area, and the size of the second target area, so as to realize that the second identifier corresponds to the second target area.
  • the first node may also generate second verification information corresponding to the second identification, that is, the second identification corresponds to the second target area and the second verification information.
  • the first node stores a mapping relationship between the second identification, the second verification information, the address of the second target area and the size of the second target area, so as to realize the mapping between the second identification, the second target area and the second Corresponds to the verification information.
  • the first node may also obtain the mapping relationship between the address of the sixth memory page and the storage location of the second data, and obtain the sixth page table entry corresponding to the sixth memory page, and convert the sixth memory page The mapping relationship between the address and the storage location of the second data, and the sixth page table entry is saved to the second target area.
  • the first node may also obtain the CPU running status of the first node and save the CPU running status of the first node to the second target area.
  • the first node obtains the state of at least one second CPU register used by the i-th basic execution unit, and the CPU operating state of the first node includes the state of the at least one second CPU register. Saving the state of the at least one second CPU register into a second target area.
  • the first node may also obtain the file descriptor of at least one file opened by the i-th basic execution unit during execution, and save the file descriptor of the at least one file to the second target area.
  • the scheduling node receives the second information and calls the j-th basic execution unit.
  • the i-th basic execution unit is the predecessor dependent unit of the j-th basic execution unit, and j is an integer greater than i.
  • the i-th basic execution unit and the j-th basic execution unit are the two basic execution units in the distributed workflow.
  • the scheduling node sends the node identification, the second information and the j-th basic execution unit of the first node to the third node.
  • the third node is a computing node scheduled by the scheduling node.
  • the third node receives the node identification, the second information and the j-th basic execution unit of the first node, and performs the above-mentioned operations of steps 308-310 like the first node to execute the j-th basic execution unit.
  • multiple distributed workflows are input to the scheduling node, the scheduling node calls basic execution units in the multiple distributed workflows, and the computing node is scheduled to process the called basic execution units. Due to the embodiment of the present application, the computing node saves time overhead and memory copying when acquiring running data.
  • the more distributed workflows processed by the embodiment of the present application the greater the throughput generated, and is greater than the traditional method.
  • the abscissa represents the number of distributed workflows processed, and the ordinate represents throughput. From Figure 9, it can be concluded that processing the same number of distributed workflows, the throughput generated by the embodiment of the present application is greater than that of the traditional method Throughput generated.
  • the abscissa represents throughput, which is the number of distributed workflows that can be processed per second
  • the ordinate represents delay, which shows the relationship between throughput and delay under different concurrency configurations.
  • the abscissa represents the number of distributed workflows processed
  • the ordinate represents the delay in processing distributed workflows.
  • the delay is expressed in a logarithmic manner. For example, when the embodiment of the present application processes 10,000 distributed workflows, Workflow, the corresponding logarithmic delay is 2, which means that the delay required to process 10,000 distributed workflows is 10 2 ms.
  • the second node after obtaining the first target operation data generated by the first basic execution unit, the second node sends the first information to the scheduling node, and the first information includes the first identifier.
  • the first identifier corresponds to the first target area of the second node, the first target area stores at least one first page table entry, and at least one first memory page corresponding to the at least one first page table entry is used to store the first Target operating data.
  • the scheduling node receives the first information, calls the i-th basic execution unit, the predecessor dependent unit of the i-th basic execution unit is the first basic execution unit, and sends the first information and the i-th basic execution unit to the first node.
  • the first node obtains at least one first page table entry in the first target area based on the first identification, and allocates at least one second page table entry to the i-th basic execution unit.
  • the at least one second page table entry is consistent with the at least one The entries on the first page correspond one to one.
  • the memory of the first node includes at least one second memory page, and the at least one second memory page has a one-to-one correspondence with the at least one second page table entry.
  • the first target operation data saved in the first memory page of the second node is read, and the first basic execution unit is executed based on the first target operation data. Since the first node directly reads the data stored in the first memory page of the second node, memory copy overhead and time overhead are saved. Since the data stored in the first memory page corresponding to the second memory page is read only when a certain second memory page is read, reading on demand is achieved and the overhead of network resources is reduced.
  • an embodiment of the present application provides a device 1200 for obtaining data.
  • the device 1200 is deployed on the computing node in the network architecture 100 shown in Figure 1, or is deployed on the first node of the method 300. superior.
  • the device 1200 includes:
  • the receiving unit 1201 is used to receive the first identification and the node identification of the second node.
  • the device 1200 is used to execute the first basic execution unit.
  • the second node is used to execute the second basic execution unit.
  • the second basic execution unit is the first basic execution unit.
  • a predecessor dependent unit of a basic execution unit, the memory of the second node is used to save the operation data of the second basic execution unit, and the first identifier is used to indicate the operation data;
  • the processing unit 1202 is configured to obtain the storage address of the operating data based on the first identification and the node identification;
  • the processing unit 1202 is also configured to read the operating data stored in the memory of the second node based on the storage address.
  • step 308 of the method 300 shown in Figure 3 please refer to the relevant content of step 308 of the method 300 shown in Figure 3, which will not be described in detail here.
  • step 309 of the method 300 shown in FIG. 3, please refer to the relevant content of step 309 of the method 300 shown in FIG. 3, which will not be described in detail here.
  • the memory of the second node includes at least one first memory page, the at least one first memory page is used to save the running data, and the first identifier is used to indicate at least one first memory page.
  • processing unit 1202 is configured to:
  • Obtain the storage address which is the address of the first memory page corresponding to the second memory page read by the first basic execution unit for the first time.
  • the processing unit 1202 acquiring the address of at least one first memory page, refer to the relevant contents of 3081-3083 of the method 300 shown in FIG. 3 , which will not be described in detail here.
  • the processing unit 1202 is used to read the operating data stored in the acquired first memory page based on the address of the acquired first memory page.
  • the memory of the second node includes a target area, the target area corresponds to the first identifier, and the target area stores the address of at least one first memory page;
  • the device 1200 also includes a sending unit 1203,
  • Sending unit 1203, configured to send an acquisition request to the second node based on the node identifier, where the acquisition request includes the first identifier;
  • the receiving unit 1201 is also configured to receive an acquisition response sent by the second node, where the acquisition response includes the address and size of the target area corresponding to the first identification;
  • the processing unit 1202 is configured to read an address of at least one first memory page stored in the target area based on the address and size of the target area.
  • the receiving unit 1201 refers to the relevant content of 3082 of the method 300 shown in Figure 3, which will not be described in detail here.
  • the second node includes at least one first page table entry, and the at least one first page table entry corresponds to at least one first memory page, and the device 1200 includes at least one second page table entry,
  • the processing unit 1202 is configured to allocate at least one second page table entry to the first basic execution unit, the at least one second page table entry corresponds to the at least one first page table entry, and the at least one second memory page includes each second page The memory page corresponding to the table entry.
  • the target area stores the central processing unit CPU running status of the second node, and the processing unit 1202 is also used to:
  • the CPU running state of the device 1200 is set to the CPU running state of the second node.
  • the processing unit 1202 reads the CPU running status of the second node and sets the CPU running status of the device 1200.
  • the processing unit 1202 reads the CPU running status of the second node and sets the CPU running status of the device 1200.
  • the CPU running status of the second node includes the status of at least one first CPU register, and the at least one first CPU register is a second register used based on the execution unit in the CPU of the second node.
  • the CPU of the device 1200 The running state includes the state of at least one second CPU register.
  • the at least one first CPU register corresponds to the at least one second CPU register.
  • the at least one second CPU register is used by the first execution unit in the CPU of the device 1200. the register,
  • Processing unit 1202 used for:
  • each second CPU register is respectively set to the state of the first CPU register corresponding to each second CPU register.
  • the target area also stores a mapping relationship between the address of the third memory page and the storage location of the first data.
  • the memory of the second node also includes the third memory page.
  • the first data is stored on the disk of the second node.
  • the first data is data that the second basic execution unit needs to write to the third memory page but has not yet been written to the third memory page.
  • the device 1200 includes a fourth memory page corresponding to the third memory page. , processing unit 1202, also used for:
  • the storage location of the first data is obtained based on the address of the third memory page corresponding to the fourth memory page and the mapping relationship;
  • the first data is obtained, and the first data is data used by the device 1200 to execute the first basic execution unit.
  • the processing unit 1202 reads the implementation process of the mapping relationship saved in the target area. Please refer to the relevant content of 3083 of the method 300 shown in Figure 3, which will not be described in detail here.
  • the target area also stores a file descriptor of at least one file opened by the second basic execution unit during execution, and the second node includes at least one file.
  • the processing unit 1202 is also configured to read the file descriptor of at least one file stored in the target area.
  • the file descriptor of the at least one file is data used by the device 1200 to execute the first basic execution unit.
  • the first basic execution unit and the second basic execution unit are two functions in the distributed workflow.
  • the processing unit since the receiving unit receives the first identification and the node identification of the second node and the first identification indicates the operating data stored in the memory of the second node, the processing unit based on the first identification and the node identification of the second node , the storage address of the running data can be obtained, and the running data saved in the memory of the second node can be directly read based on the storage address. Since the processing unit directly reads the running data stored in the memory of the second node based on the storage address, there is no need to copy the running data stored in the memory of the second node to the file system, eliminating the cost of memory copying. The processing unit directly reads the running data stored in the memory of the second node at a higher rate, saving time and overhead.
  • the embodiment of the present application provides a device 1300 for obtaining data.
  • the device 1300 is deployed on the computing node in the network architecture 100 shown in Figure 1, or is deployed on the second node of the method 300. superior.
  • the device 1300 includes:
  • the processing unit 1301 is configured to obtain a first identifier, the device 1300 is configured to execute a second basic execution unit, and the first identifier is used to indicate the operating data of the second basic execution unit saved in the memory of the device 1300;
  • the sending unit 1302 is used to send a first identifier.
  • the first identifier is used to trigger the first node to obtain the storage address of the running data, and to read the running data stored in the memory of the device 1300 based on the storage address.
  • the first node is used to execute
  • the first basic execution unit and the second basic execution unit are the predecessor dependent units of the first basic execution unit.
  • the memory of the device 1300 includes at least one first memory page, the at least one first memory page is used to save operating data, and the first identifier is used to indicate at least one first memory page.
  • the memory of the device 1300 includes a target area, and the target area corresponds to the first identifier.
  • the device 1300 further includes a receiving unit 1303,
  • the processing unit 1301 is configured to save the address of at least one first memory page to the target area
  • the receiving unit 1303 is configured to receive an acquisition request sent by the first node, where the acquisition request includes a first identifier
  • the sending unit 1302 is also used to send an acquisition response to the first node, which includes the address and size of the target area corresponding to the first identifier.
  • the acquisition response is used to trigger the first node to read the address of at least one first memory page stored in the target area based on the address and size of the target area.
  • the sending unit 1302 sends the implementation process of obtaining the response. Please refer to the relevant content of 3082 of the method 300 shown in Figure 3, which will not be described in detail here.
  • the processing unit 1301 is also configured to save the CPU running status of the device 1300 to the target area, so that the first node reads the CPU running status saved in the target area.
  • the CPU of the device 1300 includes at least one first central processor CPU register used by the second execution unit, and the CPU running state includes the state of the at least one first CPU register.
  • processing unit 1301 is also used to:
  • the memory of the device 1300 also includes the third memory page, and the first data is stored in the disk of the device 1300,
  • the first data is data that the second basic execution unit needs to write to the third memory page but has not yet been written to the third memory page, so that the first node reads the mapping relationship saved in the target area.
  • processing unit 1301 is also used to:
  • the first basic execution unit and the second basic execution unit are two functions in a distributed workflow.
  • the sending unit sends the first identifier, so that the first node can obtain the storage address of the operating data based on the first identifier, and directly read the operating data stored in the memory of the second node based on the storage address. Since the first node directly reads the operating data stored in the memory of the second node based on the storage address, it is not necessary to copy the operating data stored in the memory of the device to the file system, thereby eliminating the memory copy overhead. The rate at which the first node directly reads the operating data stored in the memory of the device is high, saving time overhead.
  • the embodiment of the present application provides a device 1400 for obtaining data.
  • the device 1400 is deployed on the scheduling node in the network architecture 100 shown in Figure 1, or is deployed on the scheduling node of the method 300.
  • the device 1400 includes:
  • the receiving unit 1401 is used to receive the first identification sent by the second node.
  • the second node is used to execute the second basic execution unit.
  • the memory of the second node is used to save the operating data of the second basic execution unit.
  • the first identification is used to Indicate this operating data;
  • the sending unit 1402 is used to send the first identification and the node identification of the second node to the first node, the first node is used to execute the first basic execution unit, and the second basic execution unit is the predecessor dependent unit of the first basic execution unit, The first identifier and the node identifier are used to trigger the first node to obtain the storage address of the running data, and to read the running data stored in the memory of the second node based on the storage address.
  • the receiving unit 1401 receiving the first identifier refers to the relevant content of 306 of the method 300 shown in FIG. 3 , which will not be described in detail here.
  • the memory of the second node includes at least one first memory page, the at least one first memory page is used to save the running data, and the first identifier is used to indicate at least one first memory page.
  • the first basic execution unit and the second basic execution unit are functions of two in the distributed workflow.
  • the sending unit sends the first identifier and the node identifier of the second node to the first node, so the first node is based on the first identifier and the node identifier of the second node.
  • the node identifier of the second node can obtain the storage address of the running data, and directly read the running data stored in the memory of the second node based on the storage address. Since the first node directly reads the operating data stored in the memory of the second node based on the storage address, the second node does not need to copy the operating data stored in the memory of the second node to the file system, eliminating the cost of memory copying.
  • the first node directly reads the running data saved in the memory of the second node at a higher rate, saving time and overhead.
  • an embodiment of the present application provides a schematic diagram of a device 1500.
  • the device 1500 may be the first node, the second node or the scheduling node in any of the above embodiments.
  • the device 1500 may be a scheduling node or a computing node in the network architecture 100 shown in FIG. 1 , or the first node, the second node or the scheduling node in the method 300 shown in FIG. 3 .
  • the device 1500 includes at least one processor 1501, internal connections 1502, memory 1503 and at least one transceiver 1504.
  • the device 1500 is a hardware structure device.
  • the processing unit 1202 in the device 1200 shown in FIG. 12 can be implemented by calling the code in the memory 1503 through the at least one processor 1501.
  • the receiving unit 1201 and the sending unit 1203 in the device 1200 shown in FIG. 12 can be implemented by the at least one transceiver 1504.
  • the device 1500 can also be used to implement the functions of the first node in any of the above embodiments.
  • the processing unit 1201 in the device 1300 shown in FIG. 13 can be implemented by calling the code in the memory 1503 through the at least one processor 1501.
  • the sending unit 1302 and the receiving unit 1303 in the device 1200 shown in FIG. 13 can be implemented by the at least one transceiver 1504.
  • the device 1500 can also be used to implement the functions of the second node in any of the above embodiments.
  • the receiving unit 1401 and the sending unit 1402 in the device 1400 shown in FIG. 14 can be implemented by the at least one transceiver 1504.
  • the device 1500 can also be used to implement the function of the scheduling node in any of the above embodiments.
  • the above-mentioned processor 1501 can be a general central processing unit (CPU), a network processor (network processor, NP), a microprocessor, an application-specific integrated circuit (ASIC) , or one or more integrated circuits used to control the execution of the program of this application.
  • CPU central processing unit
  • NP network processor
  • ASIC application-specific integrated circuit
  • the internal connection 1502 may include a path for transmitting information between the components.
  • the internal connection 1502 is a single board or bus, etc.
  • the above-mentioned transceiver 1504 is used to communicate with other devices or communication networks.
  • the above-mentioned memory 1503 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM) or other types that can store information and instructions.
  • Type of dynamic storage device it can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc Storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store the desired program code in the form of instructions or data structures and can be used by Any other media accessible by a computer, but not limited to this.
  • the memory can exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.
  • the memory 1503 is used to store the application program code for executing the solution of the present application, and the processor 1501 controls the execution.
  • the processor 1501 is used to execute the application code stored in the memory 1503, and cooperate with at least one transceiver 1504, so that the device 1500 implements the functions in the patent method.
  • the processor 1501 may include one or more CPUs, such as CPU0 and CPU1 in Figure 15.
  • the device 1500 may include multiple processors, such as the processor 1501 and the processor 1507 in Figure 15 .
  • processors may be a single-CPU processor or a multi-CPU processor.
  • a processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • an embodiment of the present application provides a system 1600 for obtaining data.
  • the system 1600 includes a device 1200 shown in Figure 12, a device 1300 shown in Figure 13, and a device 1400 shown in Figure 14.
  • the device shown in FIG. 12 may be the first node 1601, the device 1300 shown in FIG. 13 may be the second node 1602, and the device 1400 shown in FIG. 14 may be the scheduling node 1603.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种获取数据的方法、装置、系统及存储介质,属于通信领域。所述方法包括:第一节点接收第一标识和第二节点的节点标识,所述第一节点用于执行第一基本执行单元,所述第二节点用于执行第二基本执行单元,所述第二基本执行单元是所述第一基本执行单元的前驱依赖单元,所述第二节点的内存用于保存所述第二基本执行单元的运行数据,所述第一标识用于指示所述运行数据;所述第一节点基于所述第一标识和所述节点标识获取所述运行数据的存储地址;所述第一节点基于所述存储地址,读取所述第二节点的内存保存的运行数据。本申请能够节省内存拷贝开销以及时间开销。

Description

获取数据的方法、装置、系统及存储介质 技术领域
本申请涉及通信技术领域,特别涉及一种获取数据的方法、装置、系统及存储介质。
背景技术
分布式工作流包括多个基本执行单元,可以使用多个节点执行该多个基本执行单元。该多个基本单元之间存在依赖关系,该多个节点基于该依赖关系执行该多个节点。例如,假设分布式工作流包括第一基本执行单元和第二基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元,第一节点用于执行第一基本执行单元,第二节点用于执行第二基本执行单元。第一节点在执行第一基本执行单元前,需要获取第二基本执行单元被执行时产生的运行数据,然后基于该运行数据执行第一基本执行单元。
目前第一节点获取运行数据的过程为:第二节点的内存保存有第二基本执行单元被执行时产生的运行数据,第二节点从该内存中读取该运行数据并将该运行数据保存到文件系统包括的文件中,第一节点从文件系统中读取该文件,从该文件中获取该运行数据。
例如,在服务器无感知(Serverless computing)计算领域,分布式工作流可以为应用程序,第一基本执行单元和第二基本执行单元为应用程序的两个函数。假设第二基本执行单元为该应用程序的函数def A():return“hello”,第一基本执行单元为该应用程序的函数def B(input):print(input)#hello。第二节点的内存包括第二基本执行单元被执行时产生的运行数据为“hello”,将该运行数据“hello”保存到文件系统包括的文件中。第一节点从文件系统中读取该文件,从该文件中获取该运行数据“hello”,基于该运行数据“hello”来执行第一基本执行单元。
然而,第二节点从该内存中读取运行数据并保存到文件中,会产生大量的内存拷贝开销,第一节点从文件系统中读取该文件,会产生较长的时间开销。
发明内容
本申请提供了一种获取数据的方法、装置、系统及存储介质,以节省内存拷贝开销以及时间开销。所述技术方案如下:
第一方面,本申请提供了一种获取数据的方法,在所述方法中,第一节点接收第一标识和第二节点的节点标识,第一节点用于执行第一基本执行单元,第二节点用于执行第二基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元,第二节点的内存用于保存第二基本执行单元的运行数据,第一标识用于指示该运行数据。第一节点基于第一标识和该节点标识获取该运行数据的存储地址。第一节点基于该存储地址,读取第二节点的内存保存的运行数据。
由于第一节点接收第一标识和第二节点的节点标识以及第一标识指示第二节点的内存保存的运行数据,因此第一节点基于第一标识和第二节点的节点标识,能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于基于该存储地 址直接读取第二节点的内存保存的运行数据,从而不用将第二节点的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。第一节点直接读取第二节点的内存保存的运行数据的速率较高,节省了时间开销。
在一种可能的实现方式中,第二节点的内存包括至少一个第一内存页,至少一个第一内存页用于保存该运行数据,第一标识用于指示该至少一个第一内存页。由于该至少一个第一内存页用于保存该运行数据,从而使得第一标识用于指示该运行数据。
在另一种可能的实现方式中,第一节点基于第一标识和该节点标识,获取至少一个第一内存页的地址。第一节点为第一基本执行单元分配至少一个第二内存页,第一节点包括至少一个第二内存页,该至少一个第二内存页与至少一个第一内存页一一对应。第一节点获取存储地址,该存储地址为第一基本执行单元首次读取的第二内存页对应的第一内存页的地址。
在第一基本执行单元首次读取第二内存页时,由于该第二内存页中的数据存储该第二内存页对应的第一内存页中,因此获取该第一内存页的地址,这样基于该第一内存页的地址,直接读取该第一内存页保存的运行数据,实现按需读取数据,节省网络资源。
在另一种可能的实现方式中,第一节点基于获取的第一内存页的地址,读取获取的第一内存页保存的运行数据,从而实现按需读取数据,节省网络资源。
在另一种可能的实现方式中,第二节点的内存还包括目标区域,该目标区域与第一标识相对应,该目标区域保存有至少一个第一内存页的地址。第一节点基于该节点标识向第二节点发送获取请求,该获取请求包括第一标识。第一节点接收第二节点发送的获取响应,该获取响应包括第一标识对应的该目标区域的地址和大小。第一节点基于该目标区域的地址和大小,读取该目标区域保存的该至少一个第一内存页的地址。
这样通过获取请求和获取响应使得第一节点能够得到该目标区域的地址和大小,基于该目标区域的地址和大小,读取该目标区域保存的该至少一个第一内存页的地址,可以提高读取效率,减小读取时间。
在另一种可能的实现方式中,第二节点包括至少一个第一页表项,至少一个第一页表项与至少一个第一内存页对应,第一节点包括至少一个第二页表项。第一节点为第一基本执行单元分配至少一个第二页表项,该至少一个第二页表项与该至少一个第一页表项一一对应,该至少一个第二内存页包括每个第二页表项对应的内存页,这样使得该至少一个第一内存页与该至少一个第二内存页对应。
在另一种可能的实现方式中,该目标区域保存有第二节点的中央处理器CPU运行状态,第一节点读取该目标区域保存的第二节点的CPU运行状态。第一节点将第一节点的CPU运行状态设置为第二节点的CPU运行状态。这样在第一节点上恢复出第二节点的CPU运行状态,不用初始化第一节点的CPU运行状态,提高第一节点执行第一基本执行单元的效率。
在另一种可能的实现方式中,第二节点的CPU运行状态包括至少一个第一CPU寄存器的状态,至少一个第一CPU寄存器是第二节点的CPU中的第二基于执行单元使用的寄存器,第一节点的CPU运行状态包括至少一个第二CPU寄存器的状态,至少一个第一CPU寄存器与至少一个第二CPU寄存器对应,至少一个第二CPU寄存器是第一节点的CPU中的第一基于执行单元使用的寄存器。
第一节点读取目标区域保存的至少一个第一CPU寄存器的状态。第一节点将每个第二CPU寄存器的状态分别设置为每个第二CPU寄存器对应的第一CPU寄存器的状态。如此,第一节点实现将第一节点的CPU运行状态设置为第二节点的CPU运行状态,从在第一节点上恢复第二节点的CPU运行状态。
在另一种可能的实现方式中,该目标区域还保存有第三内存页的地址与第一数据的存储位置之间的映射关系,第二节点的内存还包括第三内存页,第一数据存储在第二节点的磁盘中,第一数据是第二基本执行单元需要写入所述第三内存页但还未写入到第三内存页的数据,第一节点包括与第三内存页相对应的第四内存页。第一节点读取该目标区域保存的该映射关系。第一节点在第一基本执行单元首次读取第四内存页时,基于第四内存页对应的第三内存页的地址和该映射关系,获取第一数据的存储位置。第一节点基于该存储位置和该节点标识,获取第一数据,第一数据是第一节点执行第一基本执行单元使用的数据。这样保证第一节点能够成功运行第一基本执行单元。
在另一种可能的实现方式中,该目标区域还保存有第二基本执行单元在执行时打开的至少一个文件的文件描述符,第二节点包括该至少一个文件。第一节点读取该目标区域保存的该至少一个文件的文件描述符,该至少一个文件的文件描述符是第一节点执行第一基本执行单元使用的数据。这样保证第一节点能够成功运行第一基本执行单元。
在另一种可能的实现方式中,第一基本执行单元和第二基本执行单元是分布式工作流中的两个函数。
第二方面,本申请提供了一种获取数据的方法,在所述方法中,第二节点获取第一标识,第二节点用于执行第二基于执行单元,第一标识用于指示第二节点的内存中保存的第二基本执行单元的运行数据。第二节点发送第一标识,第一标识用于触发第一节点获取该运行数据的存储地址,以及基于该存储地址读取第二节点的内存保存的运行数据,第一节点用于执行第一基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元。
由于第二节点获取的第一标识指示第二节点的内存保存的运行数据,第二节点发送第一标识,这样第一节点基于第一标识能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于第一节点基于该存储地址直接读取第二节点的内存保存的运行数据,从而第二节点不用将第二节点的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。第一节点直接读取第二节点的内存保存的运行数据的速率较高,节省了时间开销。
在一种可能的实现方式中,第二节点的内存包括至少一个第一内存页,该至少一个第一内存页用于保存该运行数据,第一标识用于指示该至少一个第一内存页。由于该至少一个第一内存页用于保存该运行数据,从而使得第一标识用于指示该运行数据。
在另一种可能的实现方式中,第二节点的内存包括目标区域,该目标区域与第一标识相对应。第二节点向该目标区域保存该至少一个第一内存页的地址。第二节点接收第一节点发送的获取请求,该获取请求包括第一标识。第二节点向第一节点发送获取响应,该获取响应包括第一标识对应的该目标区域的地址和大小,该获取响应用于触发第一节点基于该目标区域的地址和大小,读取该目标区域保存的该至少一个第一内存页的地址。这样通过获取请求和获取响应使得第一节点能够得到该目标区域的地址和大小,基于该目标区域的地址和大小,读取该目标区域保存的该至少一个第一内存页的地址,可以提高读取效率,减小读取时间。
在另一种可能的实现方式中,第二节点向该目标区域保存第二节点的CPU运行状态,以使第一节点读取该目标区域保存的CPU运行状态。这样第一节点基于读取的CPU运行状态,在第一节点上恢复出该CPU运行状态,不用初始化第一节点的CPU运行状态,提高第一节点执行第一基本执行单元的效率。
在另一种可能的实现方式中,第二节点的CPU包括第二基于执行单元使用的至少一个第一中央处理器CPU寄存器,该CPU运行状态包括该至少一个第一CPU寄存器的状态。
在另一种可能的实现方式中,第二节点向该目标区域保存第三内存页的地址与第一数据的存储位置之间的映射关系,第二节点的内存还包括第三内存页,第一数据存储在第二节点的磁盘中,第一数据是第二基本执行单元需要写入第三内存页但还未写入到第三内存页的数据,以使第一节点读取该目标区域保存的该映射关系。其中,第一节点基于该映射关系获取第一数据,第一数据用于执行第一基本执行单元,这样保证第一节点能够成功运行第一基本执行单元。
在另一种可能的实现方式中,第二节点向该目标区域保存第二基本执行单元在执行时打开的至少一个文件的文件描述符,第二节点包括该至少一个文件,以使触发第一节点读取该目标区域保存的该至少一个文件的文件描述符。该至少一个文件的文件描述符用于执行第一基本执行单元,这样保证第一节点能够成功运行第一基本执行单元。
在另一种可能的实现方式中,第一基本执行单元和第二基本执行单元是分布式工作流中的两个函数。
第三方面,本申请提供了一种获取数据的方法,在所述方法中,调度节点接收第二节点发送的第一标识,第二节点用于执行第二基于执行单元,第二节点的内存用于保存第二基本执行单元的运行数据,第一标识用于指示该运行数据。调度节点向第一节点发送第一标识和第二节点的节点标识,第一节点用于执行第一基本执行单元,第二基本执行单元是第一基本 执行单元的前驱依赖单元,第一标识和该节点标识用于触发第一节点获取该运行数据的存储地址,以及基于该存储地址读取第二节点的内存保存的运行数据。
由于第一标识指示第二节点的内存保存的运行数据,调度节点向第一节点发送第一标识和第二节点的节点标识,因此第一节点基于第一标识和第二节点的节点标识,能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于基于该存储地址直接读取第二节点的内存保存的运行数据,从而不用将第二节点的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。第一节点直接读取第二节点的内存保存的运行数据的速率较高,节省了时间开销。
在一种可能的实现方式中,第二节点的内存包括至少一个第一内存页,该至少一个第一内存页用于保存该运行数据,第一标识用于指示至少一个第一内存页。由于该至少一个第一内存页用于保存该运行数据,从而使得第一标识用于指示该运行数据。
在一种可能的实现方式中,第一基本执行单元和第二基本执行单元是分布式工作流中两个的函数。
第四方面,本申请提供了一种获取数据的装置,用于执行第一方面或第一方面的任意一种可能的实现方式中的方法。具体地,所述装置包括用于执行第一方面或第一方面的任意一种可能的实现方式中的方法的单元。
第五方面,本申请提供了一种获取数据的装置,用于执行第二方面或第二方面的任意一种可能的实现方式中的方法。具体地,所述装置包括用于执行第二方面或第二方面的任意一种可能的实现方式中的方法的单元。
第六方面,本申请提供了一种获取数据的装置,用于执行第三方面或第三方面的任意一种可能的实现方式中的方法。具体地,所述装置包括用于执行第三方面或第三方面的任意一种可能的实现方式中的方法的单元。
第七方面,本申请提供了一种第一节点,包括至少一个处理器和存储器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现第一方面或第一方面的任意一种可能的实现方式中的方法。
第八方面,本申请提供了一种第二节点,包括至少一个处理器和存储器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现第二方面或第二方面的任意一种可能的实现方式中的方法。
第九方面,本申请提供了一种调度节点,包括至少一个处理器和存储器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现第三方面或第三方面的任意一种可能的实现方式中的方法。
第十方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括在计算机可读存储介质中存储的计算机程序,并且所述计算程序通过处理器进行加载来实现上述第一方面、第二方面、第三方面、第一方面任意可能的实现方式、第二方面任意可能的实现方式或第三方面任意可能的实现方式的方法。
第十一方面,本申请提供了一种计算机可读存储介质,用于存储计算机程序,所述计算机程序通过处理器进行加载来执行上述第一方面、第二方面、第三方面、第一方面任意可能的实现方式、第二方面任意可能的实现方式或第三方面任意可能的实现方式的方法。
第十二方面,本申请提供了一种芯片,包括存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行该计算机指令,以执行上述第一方面、第二方面、第三方面、第一方面任意可能的实现方式、第二方面任意可能的实现方式或第三方面任意可能的实现方式的方法。
第十三方面,本申请提供了一种获取数据的系统,所述系统包括第四方面所述的装置和第五方面所述的装置,或者,所述系统包括第七方面所述的第一节点和第八方面所述的第二节点。
在一种可能的实现方式中,所述系统还包括第六方面所述的装置或第九方面所述的调度节点。
附图说明
图1是本申请实施例提供的一种网络架构示意图;
图2是本申请实施例提供的一种计算节点的结构示意图;
图3是本申请实施例提供的一种获取数据的方法流程图;
图4是本申请实施例提供的一种有向无环图的示意图;
图5是本申请实施例提供的一种向第一目标区域保存数据的流程图;
图6是本申请实施例提供的一种获取存储地址的流程图;
图7是本申请实施例提供的一种读取第二节点的内存保存的运行数据的流程图;
图8是本申请实施例提供的一种数据量与传输时延之间的关系图;
图9是本申请实施例提供的一种分布式工作流的数量与吞吐量之间的关系图;
图10是本申请实施例提供的一种吞吐量和时延之间的关系图;
图11是本申请实施例提供的一种分布式工作流的数量与时延之间的关系图;
图12是本申请实施例提供的一种获取数据的装置结构示意图;
图13是本申请实施例提供的另一种获取数据的装置结构示意图;
图14是本申请实施例提供的另一种获取数据的装置结构示意图;
图15是本申请实施例提供的一种设备结构示意图;
图16是本申请实施例提供的一种获取数据的系统结构示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步地详细描述。
分布式工作流包括多个基本执行单元,用户向调度节点提供分布式工作流的基本执行单元,无需为分布式工作流的各基本执行单元部署执行所需要的计算资源。调度节点接收分布式工作流后,调用分布式工作流中的基本执行单元,每调用一个基本执行单元,自动启动至少一个计算节点来执行该基本执行单元。由于计算节点以按需的方式进行启动,因此能够更好的提高计算资源利用率。
例如,在服务器无感知计算领域,分布式工作流可能是应用程序,应用程序的基本执行单元为函数,用户向调度节点提供应用程序,调度节点在调用该应用程序的某个函数时,启动一个计算节点来执行该函数。再例如,应用程序包括方法,应用程序的基本执行单元为应用程序中的方法,调度节点在调用该应用程序的某个方法时,启动一个计算节点来执行该方法。上述列举了两个基本执行单元的实例,对于基本执行单元的其他实例,在此不再一一列举。
可选地,该计算节点可能是计算设备或运行在计算设备上虚拟实例等。该虚拟实例可以为虚拟机或容器等。
分布式工作流包括的多个基本执行单元之间存在依赖关系,调度节点基于该多个基本执行单元之间存在的依赖关系,调用分布式工作流的基本执行单元。
对于分布式工作流中具有依赖关系的两个基本执行单元,为了便于说明,将该两个基本执行单元分别称为第一基本执行单元和第二基本执行单元。所谓第一基本执行单元和第二基本执行单元之间具有的依赖关系是指第一基本执行单元在执行时需要第二基本执行单元产生的运行数据。
可选地,该运行数据是第二基本执行单元的执行结果,和/或,第二基本执行单元在执行过程中产生部分或全部中间结果等。
其中,第一基本执行单元可称为第二基本执行单元的后驱依赖单元,或者,第二基本执行单元可称为第一基本执行单元的前驱依赖单元。
前驱依赖单元是产生第一基本执行单元被执行时所需要的运行数据的单元。执行上述第一基本执行单元时需要第二基本执行单元被执行时产生的运行数据,即基于该运行数据执行第一基本执行单元,所以第二基本执行单元称为第一基本执行单元的前驱依赖单元。
为了便于说明,将执行第一基本执行单元的计算节点称为第一节点,将执行第二基本执行单元的计算节点称为第二节点,调度节点先调用第二基本执行单元,并启动第二节点来执行第二基本执行单元。在第二节点执行第二基本执行单元的过程中,第二基本执行单元产生目标运行数据时,目标运行数据是执行第一基本执行单元所需要的运行数据,调度节点调用第一基本执行单元,并启动第一节点。第一节点获取目标运行数据,基于目标运行数据执行第一基本执行单元。
例如,假设分布式工作流包括的第二基本执行单元为:def A():return“hello”,第一基本执行单元为:def B(input):print(input)#hello。第二基本执行单元和第一基本执行单元之间具有依赖关系。
“hello”为第二基本执行单元在执行时产生的目标运行数据,目标运行数据“hello”是执行第一基本执行单元所需要的运行数据。所以第二基本执行单元可称为第一基本执行单元的前驱依赖单元,或者,第一基本执行单元可称为第二基本执行单元的后驱依赖节点。
调度节点接收该分布式工作流后,首先调用第二基本执行单元,启动第一节点并使用第一节点执行第二基本执行单元。在第二节点执行完第二基本执行单元后,第二节点包括第二基本执行单元产生的目标运行数据“hello”。调度节点调用第二基本执行单元的后驱依赖单元,即调用第一基本执行单元,启动第一节点。第一节点获取目标运行数据“hello”,基于目标运行数据“hello”执行第一基本执行单元。
第二节点在启动时,从第二节点的内存中为第二基本执行单元分配至少一个内存页。在第二节点执行第二基本执行单元的过程中,第二基本执行单元访问该至少一个内存页,也就是说,第二基本执行单元向该至少一个内存页写入数据和/或从该至少一个内存页读取数据,该至少一个内存页中保存的数据包括第二基本执行单元的运行数据。在第二基本执行单元产生目标运行数据时,该目标运行数据保存在该至少一个内存页中。
为了使第一节点能够获取到第二基本执行单元的运行数据,第二节点能够共享目标运行数据在第二节点的内存中的存储地址,第一节点基于该存储地址,直接读取第二节点的内存保存的部分或全部目标运行数据。
在一些实施例中,该存储地址包括该至少一个内存页的地址,第一节点可以基于该至少一个内存页中的部分或全部的内存页的地址,直接读取该部分或全部的内存页保存的运行数据。如此,实现了第一节点从第二节点的内存中读取部分或全部的目标运行数据。
在一些实施例中,第一节点可能采用远程直接内存访问(remote direct memory access,RDMA)技术直接读取第二节点中的该部分或全部的内存页保存的运行数据。
由于第一节点直接读取第二节点的内存中保存的部分或全部的目标运行数据,这样省去了第二节点向文件系统保存该目标运行数据产生的大量内存拷贝开销,以及省去了第一节点从文件系统中读取该目标运行数据产生的较长时间开销。
接下来,将通过如下任意一个实施例来详细说明第一节点获取目标运行数据的过程。
参见图1,本申请实施例提供了一种网络架构100,该网络架构100包括调度节点101和计算节点102,该网络架构100包括的计算节点102的个数可以为多个,每个计算节点102能够与调度节点101通信。
调度节点101用于获取分布式工作流,该分布式工作流包括多个基本执行单元。基于该分布式工作流包括的基本执行单元之间的依赖关系,调用该分布式工作流包括的基本执行单元,为了便于说明,将调用的该基本执行单元称为第一基本执行单元。调度至少一个计算节点102,向该至少一个计算节点102发送第一基本执行单元。
第一基本执行单元是分布式工作流的第一个基本执行单元,第一基本执行单元没有前驱依赖单元,但可能有后驱依赖单元。或者,第一基本执行单元不是分布式工作流的第一个基本执行单元,也不是分布式工作的最后一个基本执行单元,第一基本执行单元可能有前驱依赖单元,也可能有后驱依赖单元。或者,第一基本执行单元是分布式工作流的最后一个基本执行单元,第一基本执行单元可能有前驱依赖单元,但没有后驱依赖单元。
在一些实施例中,如果第一基本执行单元没有前驱依赖单元,则调度节点101向该至少 一个计算节点102发送第一基本执行单元。如果第一基本执行单元有前驱依赖单元且该前驱依赖单元已产生第一目标运行数据,为了便于说明,将该前驱依赖单元称为第二基本执行单元,调度节点101向该至少一个计算节点102发送第一基本执行单元、第一标识和第二节点的节点标识。第一目标运行数据是执行第一基本执行单元所需要的运行数据,第一目标运行数据是第二基本执行单元产生的运行数据且第一目标运行数据保存在第二节点的内存中,第二节点是运行该前驱依赖单元的计算节点102,第一标识用于指示第二节点的内存中保存的第一目标运行数据。
对于该至少一个计算节点102中的任一个计算节点102,为了便于说明,将该计算节点102称为第一节点。第一节点,用于接收第一基本执行单元,执行第一基本执行单元。
在一些实施例中,如果第一基本执行单元没有前驱依赖单元,第一节点接收第一基本执行单元,执行第一基本执行单元。如果第一基本执行单元有前驱依赖单元且该前驱依赖单元已产生第一目标运行数据,第一节点接收第一基本执行单元,第一标识和第二节点的节点标识,基于第一标识和第二节点的节点标识读取第二节点的内存保存的部分或全部第一目标运行数据,基于读取的部分或全部的第一目标运行数据执行第一基本执行单元。
如果第一基本执行单元有后驱依赖单元,第一节点,还用于在第一基本执行单元产生第二目标运行数据时,向调度节点101发送第二标识。第二目标运行数据是执行该后驱依赖单元所需要的运行数据,且保存在第一节点的内存中,第二标识用于指示第一节点的内存中保存的第二目标运行数据。
调度节点101还用于接收第一节点发送的第二标识,如果分布式工作流中还有未执行的基本执行单元,则调用未执行的基本执行单元。为了便于说明,将调用的该基本执行单元称为第三基本执行单元,如果第三基本执行单元有前驱依赖单元且该前驱依赖单元为第一基本执行单元,则调度至少一个计算节点102,向该至少一个计算节点102发送第三基本执行单元、第二标识和第一节点的节点标识。如果分布式工作流中没有未执行的基本执行单元,则结束操作。
对于该至少一个计算节点102中的任一个计算节点102,该计算节点102执行与上述第一节点执行的相同操作,在此不再详细说明。
参见图2,对于该网络架构100中的任一个计算节点102,该计算节点102包括中央处理器(central processing unit,CPU)1021、页表1022和内存1023。CPU1021包括CPU寄存器,页表1022包括多个页表项,内存1023包括多个内存页,该多个页表项与该多个内存页一一对应,每个页表项分别包括每个页表项对应的内存页的地址。
对于任一个页表项,该页表项包括该页表项对应的内存页的虚拟地址和物理地址。例如,参见图2,页表1022包括页表项1、页表项2和页表项3等,内存1023包括内存页1、内存页2和内存页3等。页表项1与内存页1相对应,页表项1包括内存页1的虚拟地址和物理地址。页表项2与内存页2相对应,页表项2包括内存页2的虚拟地址和物理地址。页表项3与内存页3相对应,页表项3包括内存页3的虚拟地址和物理地址。
计算节点102在接收到调度节点101发送的基本执行单元,从该CPU1021的CPU寄存器中为该基本执行单元分配至少一个CPU寄存器,以及从页表1022中为该基本执行单元分配至少一个页表项。
该至少一个页表项与内存1023中的至少一个内存页相对应,也就是说,为该基本执行单 元分配至少一个页表项可认为是:为该基本执行单元分配该至少一个内存页。
计算节点102的CPU1021用于执行该基本执行单元。在执行该基本执行单元的过程,该基本执行单元获取需要读写的内存页的虚拟地址,该至少一个内存页包括该内存页,从页表1022中获取包括该虚拟地址的页表项,基于该页表项包括该内存页的物理地址,读取该内存页中保存的数据和/或向该内存页写入数据。该至少一个内存页中保存的数据为该基本执行单元在执行过程中的运行数据。
在执行该基本执行单元的过程,该至少一个CPU寄存器用于保存计算节点102的CPU101的运行状态。CPU101的运行状态包括该至少一个CPU寄存器的状态,CPU101的运行状态发生变化时,该基本执行单元会修改该至少一个CPU寄存器的状态。
在一些实施例中,该网络架构100中的多个计算节点102可能包括计算设备和/或虚拟实例等,该虚拟实例包括容器和/或虚拟机等。
参见图3,本申请实施例提供了一种获取数据的方法300,所述方法300应用于图1所示的网络架构100,所述方法300包括如下步骤301至步骤310的流程。
步骤301:调度节点接收分布式工作流,调用分布式工作流的第一个基本执行单元。
例如,假设分布式工作流是应用程序,应用程序包括函数Func0,Func1,Func2,Func3,Func4,……。Func0,Func1,Func2,Func3,Func4均是应用程序的基本执行单元,调度节点接收应用程序,调用该应用程序的第一个基本执行单元Func0,为第一个基本执行单元Func0调度至少一个计算节点。
参见图4,调度节点还可能将分布式工作流转换成一个有向无环图,有向无环图是分布式工作流的树形表示,该有效无环图中的每个节点是分布式工作流的基本执行单元。对于具有依赖关系的两个基本执行单元,其中一个基本执行单元是另一个基本执行单元的前驱依赖单元。在有向无环图中该两个基本执行单元之间有边相连,该一个基本执行单元是另一个基本执行单元的父节点。该有向无环图的根节点为分布式工作流的第一个基本执行单元。
例如,参见图4,分布式工作流为应用程序,应用程序包括基本执行单元Func0,Func1,Func2,Func3,Func4,……。将该分布式工作流转换成如图5所示的有向无环图,Func0为该分布式工作流的第一基本执行单元,是有向无环图的根节点。Func0是Func1和Func2的前驱依赖单元,Func0是Func1和Func2的父节点。Func1是Func3的前驱依赖单元,Func1是Func3的父节点。Func2是Func4的前驱依赖单元,Func2是Func4的父节点。
步骤302:调度节点向第二节点发送第一个基本执行单元。
第二节点是调度节点调度的一个计算节点。可选地,调度节点可能调度至少一个计算节点,这样调度节点向调度的每个计算节点发送第一个基本执行单元。
调度节点向第二节点发送第一个基本执行单元后,等待第二节点在执行第一个基本执行单元的过程中得到第一目标运行数据。第一目标运行数据是第一个基本执行单元在执行时产生的运行数据,第一目标运行数据是执行第一个基本执行单元的后驱依赖单元所需要的数据。
如果调度节点调度多个计算节点,对于该多个计算节点中除第二节点之外的每个计算节点,该每个计算节点同第二节点一样,执行如下步骤303-步骤305的操作。
步骤303:第二节点接收第一个基本执行单元,执行第一个基本执行单元。
在步骤303中,第二节点工作在用户态,在用户态下接收第一个基本执行单元,执行第 一个基本执行单元。
第二节点的CPU包括一个或多个CPU寄存器。第二节点还包括页表,该页表包括一个或多个页表项。
在步骤303中,第二节点接收第一个基本执行单元后,从第二节点的CPU包括的CPU寄存器中为第一个基本执行单元分配至少一个第一CPU寄存器,以及从第二节点包括的页表中为第一个基本执行单元分配至少一个页表项。该至少一个页表项与第二节点的内存中的至少一个内存页一一对应,每个页表项分别包括每个页表项对应的内存页的地址。初始化该至少一个第一CPU寄存器的初始状态,这样第一节点便可以使用该至少一个第一CPU寄存器和该至少一个页表项执行第一个基本执行单元。
对于任一个页表项,该页表项包括该页表项对应的内存页的虚拟地址和物理地址。即该内存页的地址包括该内存页的虚拟地址和物理地址。
该至少一个页表项是供第一个基本执行单元使用的页表项,与该至少一个页表项相对应的至少一个内存页是供第一个基本执行单元使用的内存页。
第二节点的页表包括多个页表项,该页表中的每个页表项具有序号。可选地,该页表中的每个页表项的序号可能是基于该页表中的每个页表项的顺序从0开始编号得到的。例如,该页表中的第一个页表项的序号为0,第二个页表项的序号为1,第三个页表项的序号为2,……。
第二节点的CPU运行状态包括该至少一个第一CPU寄存器的状态。初始化该至少一个第一CPU寄存器的初始状态可认为是初始化第二节点的CPU初始运行状态。也就是说,第二节点的CPU初始运行状态包括该至少一个第一CPU寄存器的初始状态。
在第二节点执行第一个基本执行单元的过程,如果第二节点的CPU运行状态发生变化时,第一个基本执行单元可能会修改该至少一个第一CPU寄存器中的部分或全部第一CPU寄存器的状态。
在第二节点执行第一个基本执行单元的过程,第一个基本执行单元基于该至少一个页表项,读写该至少一个页表项对应的至少一个内存页,该至少一个内存页中的部分内存页或全部内存页用于保存第一个基本执行单元产生的运行数据。在实现时,
在第一个基本执行单元需要读写某个内存页时,该内存页是该至少一个页表项中的某个页表项对应的内存页。第一个基本执行单元获取该内存页的虚拟地址,从第二节点的页表中获取包括该虚拟地址的页表项,该获取的页表项是该内存页对应的页表项。基于该获取的页表项包括的该内存页的物理地址,读取该内存页保存的数据和/或向该内存页写入数据。第一个基本执行单元的运行数据包括该读取的数据和/或该写入的数据。
步骤304:第二节点在第一个基本执行单元产生第一目标运行数据时,获取第一标识,第一标识用于指示第一目标运行数据,第一目标运行数据是执行第一个基本执行单元的后驱依赖单元所需要的运行数据。
第一目标运行数据可能是在第二节点执行完第一个基本执行单元后,第一个基本执行单元产生的运行结果,第二节点在执行完第一个基本执行单元后,便可确定第一个基本执行单元产生了第一目标运行数据,并获取指示第一目标运行数据的第一标识。或者,第一目标运行数据可能是在第二节点执行第一个基本执行单元的过程中,第一个基本执行单元产生的中间数据,第二节点在执行完第一个基本执行单元后,便可确定第一个基本执行单元产生了第 一目标运行数据,并获取指示第一目标运行数据的第一标识。
在一些实施例中,上述至少一个内存页包括至少一个第一内存页,该至少一个第一内存页保存第一目标运行数据。上述至少一个页表项包括至少一个第一页表项,该至少一个第一页表项与该至少一个第一内存页一一对应。
其中,第一目标运行数据在第二节点中的内存的存储地址包括该至少一个第一内存页的地址。
在一些实施例中,上述至少一个内存页还包括第三内存页,第三内存页与第一数据相对应,第一数据是存储在第二节点的磁盘中的数据,第一数据是第一个基本执行单元需要写入第三内存页但还未写入第三内存页的数据。上述至少一个页表项还包括第三页表项,第三页表项与第三内存页对应。
第二节点包括第三页表项中的第三内存页的地址与第一数据的存储位置之间的映射关系。第一数据的存储位置是第一数据在第二节点的磁盘中的位置,第一数据是第一个基本执行单元需要从第二节点的磁盘中待读取并待写入到第三内存页的数据,但第一个基本执行单元还没有将第一数据从第二节点的磁盘中读取出来并写入到第三内存页中。
第二节点会记录第三内存页的地址(第三内存页的虚拟地址和/或物理地址)与第一数据的存储位置之间的映射关系。
在一些实施例中,第一数据是第二节点的磁盘中的第一文件保存的数据,第一数据的存储位置可能包括第一文件的文件标识、第一数据在第一文件中的偏移和大小等。
在第二节点执行第一个基本执行单元的过程中,第一个基本执行单元还可能打开至少一个文件,基于该至少一个文件中的数据进行执行。该至少一个文件可能是保存在第二节点的磁盘中的文件。第二节点会记录第一个基本执行单元打开的至少一个文件的文件描述符。对于任一个文件,该文件的文件描述符用于标识该文件。
参见图5,在步骤304中,第二节点在第一个基本执行单元产生第一目标运行数据时由用户态进入内核态,当运行到内存态时,在第二节点的内存中分配一个连续的第一目标区域,获取该至少一个第一页表项,将该至少一个第一页表项保存到第一目标区域中。第二节点生成与第一目标区域相对应的第一标识。
在一些实施例中,第一标识可能包括第一目标区域的地址和大小,或者,第一标识是第二节点生成用于标识第一目标区域的身份标识号(identity,ID)等。
其中,将该至少一个第一页表项保存到第一目标区域中,可认为是:将该至少一个第一页表项对应的至少一个第一内存页的地址保存到第一目标区域中。
由于该至少一个第一页表项与该至少一个第一内存页一一对应,该至少一个第一内存页用于保存第一目标运行数据,而第一标识与第一目标区域相对应,如此实现第一标识用于指示第一目标运行数据。
参见图5,第一标识为ID的情况,第二节点保存第一标识、第一目标区域的地址和第一目标区域的大小之间的映射关系,以实现第一标识与第一目标区域相对应。
在一些实施例中,第二节点还可能生成与第一标识相对应的第一校验信息,即第一标识与第一目标区域和第一校验信息相对应。可选地,第二节点保存第一标识、第一校验信息、第一目标区域的地址和第一目标区域的大小之间的映射关系,以实现第一标识与第一目标区域和第一校验信息相对应。
在一些实施例中,第二节点采用指定的数据结构来保存第一标识、第一目标区域的地址和第一目标区域的大小之间的映射关系,或者,采用指定的数据结构来保存第一标识、第一校验信息、第一目标区域的地址和第一目标区域的大小之间的映射关系。可选地,该指定的数据结构包括哈希图(hashmap)等。
参见图5,在步骤304中,第二节点还可能获取第三内存页的地址与第一数据的存储位置之间的映射关系,以及获取与第三内存页相对应的第三页表项,将第三内存页的地址与第一数据的存储位置之间的映射关系,以及第三页表项保存到第一目标区域中。
其中,将第三页表项保存到第一目标区域中,可认为是:将第三内存页的地址保存到第一目标区域中。
参见图5,在步骤304中,第二节点还可能获取第二节点的CPU运行状态,将第二节点的CPU运行状态保存到第一目标区域中。在实现时,
第二节点获取第一个基本执行单元使用的至少一个第一CPU寄存器的状态,第二节点的CPU运行状态包括该至少一个第一CPU寄存器的状态。将该至少一个第一CPU寄存器的状态保存到第一目标区域中。
参见图5,在步骤304中,第二节点还可能获取第一个基本执行单元在执行过程中打开的至少一个文件的文件描述符,将该至少一个文件的文件描述符保存到第一目标区域中。
步骤305:第二节点向调度节点发送第一信息,第一信息包括第一标识。
在第一标识还与第一校验信息相对应时,第一信息还可能包括第一校验信息。
在一些实施例中,第二节点还分配动态连接传输(dynamically connected transport,DCT)对象,该DCT对象用于与除第二节点之外的节点建立连接。
步骤306:调度节点接收第一信息,调用第i个基本执行单元,第一个基本执行单元是第i个基本执行单元的前驱依赖单元,i=2、3、4、……。
第一个基本执行单元和第i个基本执行单元是分布式工作流中的两个基本执行单元。
在步骤306,调度节点接收到第一信息,从分布式工作流中调用第一个基本执行单元的后驱依赖单元,该后驱依赖单元是位于第一个基本执行单元之后的第i个基本执行单元。在实现时,
调度节点接收到第一信息后,确定分布式工作流是否有未被执行的基本执行单元,如果该分布式工作流没有未被执行的基本执行单元,结束操作。如果该分布式工作流有未被执行的基本执行单元,调度节点从分布式工作流对应的有向无环图中获取第一个基本执行单元的子节点。如果获取的子节点是未被执行的基本执行单元,则将获取的子节点作为第一个基本执行单元的后驱依赖单元,即得到了第i个基本执行单元。
例如,参见图4所示的分布式工作流的有向无环图,该分布式工作流的第一个基本执行单元为Func0,调度节点从该有向无环图中获取第一个基本执行单元Func0的两个子节点,该两个子节点为该分布式工作流的第二个基本执行单元Func1和第三个基本执行单元Func2。第二个基本执行单元Func1和第三个基本执行单元Func2均为第一个基本执行单元Func0的后驱依赖单元。第i个基本执行单元可能是第二个基本执行单元Func1,或者,可能是第三个基本执行单元Func2。
步骤307:调度节点向第一节点发送第二节点的节点标识、第一信息和第i个基本执行单元。
第一节点是调度节点调度的一个计算节点。可选地,调度节点可能调度至少个计算节点,这样调度节点向调度的每个计算节点发送第i个基本执行单元。
调度节点向第一节点发送第二节点的节点标识、第一信息和第i个基本执行单元后,等待第一节点在执行第i个基本执行单元的过程中得到第二目标运行数据。第二目标运行数据是第i个基本执行单元在执行时产生的运行数据,第二目标运行数据是执行第i个基本执行单元的后驱依赖单元所需要的数据。
如果调度节点调度多个计算节点,对于该多个计算节点中除第一节点之外的每个计算节点,该每个计算节点同第一节点一样,执行如下步骤308-步骤310的操作。
步骤308:第一节点接收第二节点的节点标识、第一信息和第i个基本执行单元,基于第一标识和该节点标识,获取第一目标运行数据的存储地址。
第一目标运行数据保存在第二节点的内存中,第一目标运行数据的存储地址是第一目标运行数据在第二节点的内存中的地址。
在一些实施例中,第二节点的内存包括至少一个第一内存页,该至少一个第一内存页保存第一目标运行数据。第一目标运行数据的地址包括该至少一个第一内存页的地址。
参见图6,在步骤308中,第一节点通过如下流程获取该至少一个第一内存页的地址。
3081:第一节点基于第二节点的节点标识向第二节点发送第一获取请求,第一获取请求包括第一标识。
在第一信息还包括第一校验信息的情况,第一获取请求还包括第一校验信息。
在一些实施例中,第一获取请求为第一远程过程调用(remote procedure call,RPC)请求,即第一节点基于第二节点的节点标识,向第二节点发送第一RPC请求,第一RPC请求包括第一标识,或者,第一RPC请求包括第一标识和第一校验信息。
在3081中,第一节点在接收到第二节点的节点标识、第一信息和第i个基本执行单元后,先由用户态进入内核态。在第一节点进入内核态后,第一节点基于第二节点的节点标识向第二节点发送第一获取请求。
3082:第二节点接收第一获取请求,向第一节点发送第一获取响应,第一获取响应包括与第一标识相对应的第一目标区域的地址和大小。
在一些实施例中,第一获取请求包括第一标识,第二节点包括第一标识、第一目标区域的地址和第一目标区域的大小之间的映射关系。第二节点接收第一获取请求,基于第一获取请求包括的第一标识,从该映射关系中获取与第一标识相对应的第一目标区域的地址和大小,向第一节点发送第一获取响应,第一获取响应包括与第一标识相对应的第一目标区域的地址和大小。
在一些实施例中,第一获取请求包括第一标识和第一校验信息,第二节点包括第一标识、第一校验信息、第一目标区域的地址和第一目标区域的大小之间的映射关系。第二节点接收第一获取请求,基于第一获取请求包括的第一标识,从该映射关系中获取与第一标识相对应的第一校验信息、第一目标区域的地址和大小。如果第一获取请求包括的第一校验信息与获取的第一校验信息相同,则向第一节点发送第一获取响应,第一获取响应包括与第一标识相对应的第一目标区域的地址和大小。
在一些实施例中,第一获取响应为第一RPC响应,即第二节点向第一节点发送第一RPC响应,第一RPC响应包括与第一标识相对应的第一目标区域的地址和大小。
3083:第一节点接收第一获取响应,基于第一获取响应包括的第一目标区域的地址和大小,读取第一目标区域保存的内容。
该内容包括至少一个第一页表项,读取该内容为:从第一目标区域读取该至少一个第一页表项对应的至少一个第一内存页的地址。
其中,上述3081-3082是可选的操作,在第一标识包括第一目标区域的地址和大小时,第一节点不用执行3081-3082的操作,直接基于第一获取响应包括的第一目标区域的地址和大小,读取第一目标区域保存的内容。在第一标识是ID时,第一节点执行3081-3082的操作。
在一些实施例中,第一节点接收第一获取响应,建立与第二节点之间的RDMA连接,基于第一目标区域的地址和大小,通过该RDMA连接读取第二节点的第一目标区域保存的内容。
在一些实施例中,第二节点包括DCT对象,第二节点通过该DCT对象与第一节点建立RDMA连接。
读取的该内容还可能包括如下一个或多个,第二节点的CPU运行状态、第三页表项、第三内存页的地址与第一数据的存储位置之间的映射关系,或者、第一个基本执行单元打开的至少一个文件的文件描述符等。
在一些实施例中,第二节点的CPU运行状态包括第二节点的至少一个第一CPU寄存器的状态。
参见图6,第一节点在读取到该内容后,还需要从第一节点的内存中为第i个基本执行单元分配至少一个第二内存页,该至少一个第二内存页与该至少一个第一内存页一一对应。可选地,在实现时,可通过如下3084的操作来实现。
3084:第一节点为第i个基本执行单元分配至少一个第二页表项,该至少一个第二页表项与至少一个第一页表项一一对应,该至少一个第二内存页包括每个第二页表项对应的内存页。
第一节点包括页表,该页表包括多个页表项,该页表中的每个页表项具有序号。可选地,该页表中的每个页表项的序号可能是基于该页表中的每个页表项的顺序从0开始编号得到的。例如,该页表中的第一个页表项的序号为0,第二个页表项的序号为1,第三个页表项的序号为2,……。
第一节点的内存包括多个内存页,该页表包括的多个页表项与第一节点的内存包括的多个内存页一一对应。
在3084中,第一节点参考该至少一个第一页表项的序号,在第一节点的页表中为第i个基本执行单元分配至少一个第二页表项,该至少一个第二页表项与至少一个第一页表项一一对应。可选地,对于任一个第二页表项以及对于该第二页表项对应的第一页表项,该第二页表项与该第一页表项对应可能是指:该第二页表项的序号和该第一页表项的序号相同。
该至少一个第二页表项与第一节点的内存中的至少一个第二内存页一一对应,将该至少一个第二页表项分配给第i个基本执行单元,可认为是:将该至少一个第二内存页分配给第i个基本执行单元。
由于该至少一个第一页表项与该至少一个第二页表项一一对应,使得该至少一个第一内存页与该至少一个第二内存页一一对应。
第一节点还可能将该至少一个第二页表项中的每个第二页表项的本地标记设置为第一标 记。对于任一个第二页表项的第一标记,该第一标记用于指示第i个基本执行单元还未访问该第二页表项对应的第二内存页。
页表项的本地标记用于指示第i个基本执行单元是否已访问该页表项对应的内存页。该页表项的本地标记可能是第一标记或者可能是第二标记,第二标记用于指示第i个基本执行单元已访问该页表项对应的内存页。
在读取的该内容还包括第三页表项,第一节点参考第三页表项的序号,在第一节点的页表中为第i个基本执行单元分配第四页表项,第三页表项与第四页表项相对应。可选地,第三页表项与第四页表项相对应可能是指:第三页表项的序号和第四页表项的序号可能相同。
第四页表项与第一节点的内存中的第四内存页对应,将第四页表项分配给第i个基本执行单元,可认为是:将第四内存页分配给第i个基本执行单元。
由于第三页表项与第四页表项相对应,使得第三内存页与第四内存页也相对应。
第一节点还可能将第四页表项的本地标记设置为第一标记。该第一标记用于指示第i个基本执行单元还未访问第四页表项对应的第四内存页。
第一标记为0,第二标记为1;或者,第一标记为1,第二标记为0。当然第一标记的取值还可能是其他取值,第二标记的取值还可能是其他取值,在此不再一一列举说明。
第一节点还可能将该至少一个第二页表项中的每个第二页表项的远端标记设置为第三标记。对于任一个第二页表项的第三标记,该第三标记用于指示该第二页表项对应的第二内存页的数据保存在第二节点中。和/或,第一节点还可能将第四页表项的远端标记设置为第三标记,第四页表项的第三标记用于指示第四页表项对应的第四内存页的数据保存在第二节点中。
第三标记为0或1等值。第三标记的取值还可能是其他取值,在此不再一一列举说明。
其中,上述至少一个第二内存页和/或第四内存页是第一节点分配给第i个基本执行单元所使用的内存页。也就是说,第i个基本执行单元使用的内存页包括该至少一个第二内存页,可能还包括第四内存页。第i个基本执行单元使用的页表项包括该至少一个第二页表项,可能还包括第四页表项。
在读取的该内容还包括第二节点的CPU运行状态,第一节点还可能采用如下3085的操作在第一节点上恢复出第二节点的CPU运行状态。
3085:第一节点还将第一节点的CPU运行状态设置为第二节点的CPU运行状态。在实现时,
第二节点的CPU运行状态包括至少一个第一CPU寄存器的状态,第一节点为第i个基本执行单元分配至少一个第二CPU寄存器,第一节点的CPU包括该至少一个第二CPU寄存器,至少一个第一CPU寄存器与至少一个第二CPU寄存器一一对应,该至少一个第二CPU寄存器是第一节点中的寄存器。第一节点将每个第二CPU寄存器的状态分别设置为每个第二CPU寄存器对应的第一CPU寄存器的状态。第一节点的CPU运行状态包括该至少一个第二CPU寄存器的状态。
在读取的该内容还包括第三内存页的地址与第一数据的存储位置之间的映射关系时,第一节点保存第三内存页的地址与第一数据的存储位置之间的映射关系。
在读取的该内容还包括第一个基本执行单元在执行时打开的至少一个文件的文件描述符时,第一节点保存该至少一个文件的文件描述符。
其中,第i个基本执行单元使用的内存页与第一个基本执行单元使用的内存页一一对应, 第一节点将第一节点的CPU运行状态设置为第二节点的CPU运行状态,这样第一节点不需要初始化第i个基本执行单元使用的内存页,也不需要初始化CPU运行状态,即不需要进行冷启动。第一节点接收第i个基本执行单元后,能够快速执行第i个基本执行单元,提高执行效率。
其中,上述3085是一个可选的操作,即也可以不执行3085的操作,这样第一节点为第i个基本执行单元分配至少一个第二CPU寄存器,并初始化该至少一个第二CPU寄存器的初始状态。
其中,3085的操作和3084的操作之间的执行顺序不分先后,可以先执行3084再执行3085,或者,可以先执行3085再执行3084,或者,可以同时执行3084和3085。
在执行完3084和3085的操作后,第一节点由内核态恢复为用户态,在第一节点运行在用户态,第一节点执行第i个基本执行单元。
步骤309:第一节点基于第一目标运行数据的存储地址,读取第二节点的内存保存的部分或全部第一目标运行数据。
在步骤309中,第一节点执行第i个基本执行单元。在第一节点执行第i个基本执行单元的过程中,第i个基本执行单元可能需要访问第i个基本执行单元所使用的内存页。
访问内存页包括写内存页和/或读内存页。
在第i个基本执行单元需要向某个内存页写入数据时,第i个基本执行单元获取该内存页的虚拟地址,从第二节点的页表中获取包括该虚拟地址的页表项,基于该页表项包括的该内存页的物理地址,向该内存页写入数据。
该内存页可能是某个第二内存页或第四内存页,该页表项可能是该第二内存页对应的第二页表项或第四内存页对应的第四页表项。可选地,如果该页表项的本地标记为第一标记,表示第i个基本执行单元首次访问该页表项对应的该内存页,在向该内存页写入数据后,将该页表项的本地标记设置为第二标记。
在第i个基本执行单元读取某个内存页时,如果该内存页是第i个基本执行单元首次读取的第二内存页,且在读取之前也未向该第二内存页写入数据。这样在第i个基本执行单元首次读取该第二内存页时,第一节点获取该第二内存页对应的第一内存页的地址,第一目标运行数据的存储地址包括获取的该第一内存页的地址。基于该第一内存页的地址,直接读取第二节点的内存中的该第一内存页保存的第一目标运行数据。
如果该内存页是第i个基本执行单元首次读取的第四内存页,且在读取之前也未向第四内存页写入数据。这样在第i个基本执行单元首次读取第四内存页时,第一节点获取第四内存页对应的第三内存页的地址,基于第三内存页的地址,从第三内存页的地址与第一数据的存储位置之间的映射关系中获取第一数据的存储位置。基于第一数据的存储位置,获取第二节点的磁盘中保存的第一数据。
在步骤309中,参见图7,第一节点可以通过如下3091-3097的操作,获取第一内存页保存的第一目标运行数据,或者,获取第一数据。
3091:第一节点获取第i个基本执行单元待读取的内存页的虚拟地址,待读取内存页可能是第二内存页或第四内存页,待读取内存页为第i个基本执行单元需要读取的内存页。
3092:第一节点基于待读取内存页的虚拟地址获取待读取内存页对应的待读取页表项,如果待读取页表项的本地标记为第一标记,执行操作3093。
如果待读取页表项的本地标记为第一标记,表明待读取内存页是第i个基本执行单元首次读取的内存页,且在读取待读取内存页之前,第i个基本执行单元也未向待读取内存页写入数据,也就是说,第i个基本执行单元首次访问待读取内存页。如果待读取页表项的本地标记为第二标记,第一节点直接读取待读取内存页中的数据。
在3092中,第一节点从第一节点的页表中找出包括待读取内存页的虚拟地址的页表项,该页表项为待读取页表项。如果待读取页表项的本地标记为第一标记,第一节点由用户态进入内核态,在第一节点运行在内核态时,通过如下3093-3097的操作读取数据。
在一些实施例中,第一节点通过缺页错误处理函数进入内核态。
3093:第一节点基于待读取页表项,确定待读取内存页是第二内存页还是第四内存页,如果待读取内存页为第二内存页,执行3094,如果待读取内存页为第四内存页,执行3095。
在执行3093的操作前,第一节点还可能确定待读取页表项的远端标记是否为第三标记,如果待读取页表项的远端标记是第三标记,表明待读取页表项对应的待读取内存页的数据保存在第二节点中,然后执行3093的操作。如果待读取页表项的远端标记不是第三标记,第一节点使用缺页处理函数获取待读取内存页保存的第一目标运行数据。
在3093中,第一节点获取与待读取页表项相对应的另一个页表项,该另一个页表项是第二节点的页表中的页表项。如果该另一个页表项是第一页表项,确定待读取内存页为第二内存页,如果该另一个页表项是第三页表项,确定待读取内存页为第四内存页。
3094:第一节点基于该第二内存页对应的第一内存页的地址,读取第二节点的内存中的该第一内存页中保存的第一目标运行数据,执行3097。
该另一个页表项包括该第二内存页对应的第一内存页的地址(物理地址),通过与第二节点之间的RDMA连接,读取第二节点的内存中的该第一内存页中保存的第一目标运行数据。
由于在本申请实施例中,第一节点直接读取第二节点的内存中的该第一内存页保存的第一目标运行数据,相比传统方法从文件系统中读取第一目标运行数据,大幅减小读取数据所需要的传输时延。例如,参见图8,在读取的数据量越大,传统方法从文件系统读取数据所需要的传输时延,与本申请实施例直接从第二节点的内存中读取数据所需要的传输时延之间的差值就越大,所以第一节点直接读取第二节点的内存中的第一目标运行数据,能够减小时间开销。
3095:第一节点基于第四内存页对应的第三内存页的地址,以及第三内存页的地址与第一数据的存储位置之间的映射关系,获取第一数据的存储位置。
第一数据的存储位置包括第一数据属于的第一文件的文件标识、第一数据在第一文件中的偏移和大小。
3096:第一节点基于第一数据的存储位置和第二节点的节点标识,获取第一数据。
第一节点基于第二节点的节点标识,向第二节点发送第二获取请求,第二获取请求包括第一文件的文件标识、第一数据在第一文件中的偏移和大小。第二节点接收第二获取请求,基于第一文件的文件标识,获取第二节点的磁盘中保存的第一文件。基于第一数据在第一文件中的偏移和大小,从第一文件中获取第一数据。第二节点向第一节点发送第二获取响应,第二获取响应包括第一数据。第一节点接收第二获取响应,读取第二获取响应中的第一数据。
在一些实施例中,第二获取请求为第二RPC请求,即第一节点基于第二节点的节点标识,向第二节点发送第二RPC请求,第二RPC请求包括第一文件的文件标识、第一数据在第一 文件中的偏移和大小。第二获取响应为第二RPC响应,即第二节点向第一节点发送第二RPC响应,第二RPC响应包括第一数据。
3097:第一节点将待读取页表项的本地标记设置为第二标记,第二标记用于指示第i个基本执行单元已访问待读取页表项对应的待读取内存页。
第一节点还从内核态切换为用户态,在第一节点运行在用户态下,执行如下步骤310。
步骤310:第一节点基于该部分或全部的第一目标运行数据,执行第i个基本执行单元。
在上述待读取内存页是第二内存页时,第一节点读取到该第二内存页对应的第一内存页保存的第一目标运行数据,基于第一目标运行数据执行第i个基本执行单元。在上述待读取内存页是第四内存页时,第一节点获取到第一数据,基于第一数据执行第i个基本执行单元。
在执行第i个基本执行单元时还可能需要上述至少一个文件的文件描述符,即第一节点还可能基于上述至少一个文件的文件描述符,执行第i个基本执行单元。
在步骤310中,第一节点执行第i个基本执行单元后,第i个基本执行单元可能产生第二目标运行数据。第二目标运行数据是第i个基本执行单元在执行时产生的运行数据,第二目标运行数据是执行第i个基本执行单元的后驱依赖单元所需要的数据。
在第i个基本执行单元产生第二目标运行数据后,第一节点向调度节点发送第二信息,第二信息包括第二标识,或者,第二信息包括第二标识和第二校验信息,第二标识用于指示第二目标运行数据。
在第i个基本执行单元产生第二目标运行数据后,第i个基本执行单元使用的内存页包括至少一个第五内存页,该至少一个第五内存页用于保存第二目标运行数据。该至少一个第五内存页可能包括上述第二内存页和/或第四内存页。第i个基本执行单元使用的页表项包括至少一个第五页表项,该至少一个第五页表项与该至少一个第五内存页一一对应。
第i个基本执行单元使用的内存页还可能包括第六内存页,第六内存页与第二数据相对应,第二数据是存储在第一节点的磁盘中的数据,第二数据是第i个基本执行单元需要写入第六内存页但还未写入第六内存页的数据。第六内存页可能是上述某个第二内存页或第四内存页。第i个基本执行单元使用的页表项还可能包括第六页表项,第六页表项与第六内存页对应。
第一节点包括第六页表项中的第六内存页的地址与第二数据的存储位置之间的映射关系。第二数据的存储位置是第二数据在第一节点的磁盘中的位置,第二数据是第i个基本执行单元需要从第二节点的磁盘中待读取并待写入到第六内存页的数据,但第i个基本执行单元还没有将第二数据从第一节点的磁盘中读取出来并写入到第六内存页中。
第一节点会记录第六内存页的地址(第六内存页的虚拟地址和/或物理地址)与第二数据的存储位置之间的映射关系。
在第i个基本执行单元产生第二目标运行数据后,第一节点进入内核态,在运行在内核态时,在第一节点的内存中分配一个连续的第二目标区域,获取该至少一个第五页表项,将该至少一个第五页表项保存到第二目标区域中。第一节点生成与第二目标区域相对应的第二标识。
在一些实施例中,第一节点保存第二标识、第二目标区域的地址和第二目标区域的大小之间的映射关系,以实现第二标识与第二目标区域相对应。
在一些实施例中,第一节点还可能生成与第二标识相对应的第二校验信息,即第二标识 与第二目标区域和第二校验信息相对应。可选地,第一节点保存第二标识、第二校验信息、第二目标区域的地址和第二目标区域的大小之间的映射关系,以实现第二标识与第二目标区域和第二校验信息相对应。
步骤310中,第一节点还可能获取第六内存页的地址与第二数据的存储位置之间的映射关系,以及获取与第六内存页相对应的第六页表项,将第六内存页的地址与第二数据的存储位置之间的映射关系,以及第六页表项保存到第二目标区域中。
在步骤310中,第一节点还可能获取第一节点的CPU运行状态,将第一节点的CPU运行状态保存到第二目标区域中。在实现时,
第一节点获取第i个基本执行单元使用的至少一个第二CPU寄存器的状态,第一节点的CPU运行状态包括该至少一个第二CPU寄存器的状态。将该至少一个第二CPU寄存器的状态保存到第二目标区域中。
在步骤310中,第一节点还可能获取第i个基本执行单元在执行过程中打开的至少一个文件的文件描述符,将该至少一个文件的文件描述符保存到第二目标区域中。
调度节点接收第二信息,调用第j个基本执行单元,第i个基本执行单元是第j个基本执行单元的前驱依赖单元,j是大于i的整数。第i个基本执行单元和第j个基本执行单元是分布式工作流中的两个基本执行单元。调度节点向第三节点发送第一节点的节点标识、第二信息和第j个基本执行单元,第三节点是调度节点调度的一个计算节点。
第三节点接收第一节点的节点标识、第二信息和第j个基本执行单元,同第一节点一样执行按上述步骤308-310的操作来执行第j个基本执行单元。
在一些实施例中,向调度节点输入多个分布式工作流,调度节点对该多个分布式工作流中的基本执行单元进行调用,并调度计算节点来处理调用的基本执行单元。由于本申请实施例,计算节点获取运行数据时节省了时间开销和内存拷贝,本申请实施例处理的分布式工作流越多产生的吞吐量越大,且大于传统方法。例如,参见图9,横坐标代表处理的分布式工作流的数量,纵坐标代表吞吐量,从图9可以得出处理相同数量的分布式工作流,本申请实施例产生的吞吐量大于传统方法产生的吞吐量。另外,参见图10,横坐标代表吞吐量,该吞吐量是每秒能处理的分布式工作流的个数,纵坐标代表时延,展示了在不同并发配置下吞吐量和时延关系。以及,参见图11,横坐标代表处理的分布式工作流数量,纵坐标代表处理分布式工作流的时延,该时延采用对数方式表示,例如,本申请实施例在处理10000个分布式工作流,对应的对数方式表示的时延为2,表示的含义为处理10000个分布式工作流所需要的时延为10 2ms。
在本申请实施例中,第二节点在得到第一个基本执行单元产生的第一目标运行数据后,向调度节点发送第一信息,第一信息包括第一标识。第一标识与第二节点的第一目标区域相对应,第一目标区域保存有至少一个第一页表项,该至少一个第一页表项对应的至少一个第一内存页用于保存第一目标运行数据。调度节点接收第一信息,调用第i个基本执行单元,第i个基本执行单元的前驱依赖单元为第一个基本执行单元,向第一节点发送第一信息和第i个基本执行单元。第一节点基于第一标识获取第一目标区域中的至少一个第一页表项,为第i个基本执行单元分配至少一个第二页表项,该至少一个第二页表项与该至少一个第一页表项一一对应。第一节点的内存中包括至少一个第二内存页,该至少一个第二内存页与该至少一个第二页表项一一对应。在第i个基本执行单元首次读取某个第二内存页时,第一节点获取 该第二内存页对应的第二页表项,获取该第二页表项对应的第一页表项,基于该第一页表项包括的第一内存页的地址,读取第二节点的该第一内存页保存的第一目标运行数据,基于第一目标运行数据执行第一基本执行单元。由于第一节点直接读取第二节点的第一内存页中保存的数据,从而节省了内存拷贝开销以及时间开销。由于在读取某个第二内存页时,才读取该第二内存页对应的第一内存页保存的数据,实现按需求读取,减小网络资源的开销。
参见图12,本申请实施例提供了一种获取数据的装置1200,所述装置1200部署在图1所示的网络架构100中的计算节点上,或者,部署在所述方法300的第一节点上。所述装置1200包括:
接收单元1201,用于接收第一标识和第二节点的节点标识,所述装置1200用于执行第一基本执行单元,第二节点用于执行第二基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元,第二节点的内存用于保存第二基本执行单元的运行数据,第一标识用于指示该运行数据;
处理单元1202,用于基于第一标识和该节点标识获取该运行数据的存储地址;
处理单元1202,还用于基于该存储地址,读取第二节点的内存保存的运行数据。
可选地,接收单元1201接收第一标识和第二节点的节点标识的实现过程,参见图3所示的方法300的步骤308的相关内容,在此不再详细说明。
可选地,处理单元1202获取该运行数据的存储地址的实现过程,参见图3所示的方法300的步骤308的相关内容,在此不再详细说明。
可选地,处理单元1202读取该运行数据的实现过程,参见图3所示的方法300的步骤309的相关内容,在此不再详细说明。
可选地,第二节点的内存包括至少一个第一内存页,至少一个第一内存页用于保存该运行数据,第一标识用于指示至少一个第一内存页。
可选地,处理单元1202,用于:
基于第一标识和该节点标识,获取至少一个第一内存页的地址;
为第一基本执行单元分配至少一个第二内存页,所述装置1200包括至少一个第二内存页,至少一个第二内存页与至少一个第一内存页一一对应;
获取存储地址,该存储地址为第一基本执行单元首次读取的第二内存页对应的第一内存页的地址。
可选地,处理单元1202获取至少一个第一内存页的地址的实现过程,参见图3所示的方法300的3081-3083的相关内容,在此不再详细说明。
可选地,处理单元1202为第一基本执行单元分配至少一个第二内存页的地址的实现过程,参见图3所示的方法300的3084的相关内容,在此不再详细说明。
可选地,处理单元1202,用于基于获取的第一内存页的地址,读取获取的第一内存页保存的运行数据。
可选地,处理单元1202读取运行数据的实现过程,参见图3所示的方法300的3094的相关内容,在此不再详细说明。
可选地,第二节点的内存包括目标区域,目标区域与第一标识相对应,目标区域保存有至少一个第一内存页的地址;
所述装置1200还包括发送单元1203,
发送单元1203,用于基于该节点标识向第二节点发送获取请求,该获取请求包括第一标识;
接收单元1201,还用于接收第二节点发送的获取响应,该获取响应包括第一标识对应的目标区域的地址和大小;
处理单元1202,用于基于目标区域的地址和大小,读取目标区域保存的至少一个第一内存页的地址。
可选地,发送单元1203发送获取请求的实现过程,参见图3所示的方法300的3081的相关内容,在此不再详细说明。
可选地,接收单元1201接收第二节点发送的获取响应的实现过程,参见图3所示的方法300的3082的相关内容,在此不再详细说明。
可选地,处理单元1202读取目标区域保存的至少一个第一内存页的地址的实现过程,参见图3所示的方法300的3083的相关内容,在此不再详细说明。
可选地,第二节点包括至少一个第一页表项,至少一个第一页表项与至少一个第一内存页对应,所述装置1200包括至少一个第二页表项,
处理单元1202,用于为第一基本执行单元分配至少一个第二页表项,至少一个第二页表项与至少一个第一页表项对应,至少一个第二内存页包括每个第二页表项对应的内存页。
可选地,目标区域保存有第二节点的中央处理器CPU运行状态,处理单元1202,还用于:
读取目标区域保存的第二节点的CPU运行状态;
将所述装置1200的CPU运行状态设置为第二节点的CPU运行状态。
可选地,处理单元1202读取第二节点的CPU运行状态,以及设置所述装置1200的CPU运行状态的实现过程,参见图3所示的方法300的3085的相关内容,在此不再详细说明。
可选地,第二节点的CPU运行状态包括至少一个第一CPU寄存器的状态,至少一个第一CPU寄存器是第二节点的CPU中的第二基于执行单元使用的寄存器,所述装置1200的CPU运行状态包括至少一个第二CPU寄存器的状态,至少一个第一CPU寄存器与至少一个第二CPU寄存器一一对应,至少一个第二CPU寄存器是所述装置1200的CPU中的第一基于执行单元使用的寄存器,
处理单元1202,用于:
读取目标区域保存的至少一个第一CPU寄存器的状态;
将每个第二CPU寄存器的状态分别设置为每个第二CPU寄存器对应的第一CPU寄存器的状态。
可选地,处理单元1202读取目标区域保存的至少一个第一CPU寄存器的状态的实现过程,参见图3所示的方法300的3083的相关内容,在此不再详细说明。
可选地,处理单元1202分配至少一个第二CPU寄存器以及设置每个第二CPU寄存器的状态的实现过程,参见图3所示的方法300的3085的相关内容,在此不再详细说明。
可选地,目标区域还保存有第三内存页的地址与第一数据的存储位置之间的映射关系,第二节点的内存还包括第三内存页,第一数据存储在第二节点的磁盘中,第一数据是第二基本执行单元需要写入第三内存页但还未写入到所述第三内存页的数据,所述装置1200包括与 第三内存页相对应的第四内存页,处理单元1202,还用于:
读取目标区域保存的该映射关系;
在第一基本执行单元首次读取第四内存页时,基于第四内存页对应的第三内存页的地址和该映射关系,获取第一数据的存储位置;
基于该存储位置和该节点标识,获取第一数据,第一数据是所述装置1200执行第一基本执行单元使用的数据。
可选地,处理单元1202读取目标区域保存的该映射关系的实现过程,参见图3所示的方法300的3083的相关内容,在此不再详细说明。
可选地,处理单元1202获取第一数据的存储位置的实现过程,参见图3所示的方法300的3095的相关内容,在此不再详细说明。
可选地,处理单元1202获取第一数据的实现过程,参见图3所示的方法300的3096的相关内容,在此不再详细说明。
可选地,目标区域还保存有第二基本执行单元在执行时打开的至少一个文件的文件描述符,第二节点包括至少一个文件,
处理单元1202,还用于读取目标区域保存的至少一个文件的文件描述符,至少一个文件的文件描述符是所述装置1200执行第一基本执行单元使用的数据。
可选地,处理单元1202读取目标区域保存的至少一个文件的文件描述符的实现过程,参见图3所示的方法300的3083的相关内容,在此不再详细说明。
可选地,第一基本执行单元和第二基本执行单元是分布式工作流中的两个函数。
在本申请实施例中,由于接收单元接收第一标识和第二节点的节点标识以及第一标识指示第二节点的内存保存的运行数据,因此处理单元基于第一标识和第二节点的节点标识,能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于处理单元基于该存储地址直接读取第二节点的内存保存的运行数据,从而不用将第二节点的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。处理单元直接读取第二节点的内存保存的运行数据的速率较高,节省了时间开销。
参见图13,本申请实施例提供了一种获取数据的装置1300,所述装置1300部署在图1所示的网络架构100中的计算节点上,或者,部署在所述方法300的第二节点上。所述装置1300包括:
处理单元1301,用于获取第一标识,所述装置1300用于执行第二基于执行单元,第一标识用于指示所述装置1300的内存中保存的第二基本执行单元的运行数据;
发送单元1302,用于发送第一标识,第一标识用于触发第一节点获取运行数据的存储地址,以及基于存储地址读取所述装置1300的内存保存的运行数据,第一节点用于执行第一基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元。
可选地,处理单元1301获取第一标识的实现过程,参见图3所示的方法300的304的相关内容,在此不再详细说明。
可选地,发送单元1302发送第一标识的实现过程,参见图3所示的方法300的305的相关内容,在此不再详细说明。
可选地,所述装置1300的内存包括至少一个第一内存页,至少一个第一内存页用于保存 运行数据,第一标识用于指示至少一个第一内存页。
可选地,所述装置1300的内存包括目标区域,目标区域与第一标识相对应,所述装置1300还包括接收单元1303,
处理单元1301,用于向目标区域保存至少一个第一内存页的地址;
接收单元1303,用于接收第一节点发送的获取请求,该获取请求包括第一标识;
发送单元1302,还用于向第一节点发送获取响应,该获取响应包括第一标识对应的目标区域的地址和大小,获取响应用于触发第一节点基于目标区域的地址和大小,读取目标区域保存的至少一个第一内存页的地址。
可选地,处理单元1301向目标区域保存至少一个第一内存页的地址的实现过程,参见图3所示的方法300的304的相关内容,在此不再详细说明。
可选地,接收单元1303接收获取请求的实现过程,参见图3所示的方法300的3082的相关内容,在此不再详细说明。
可选地,发送单元1302发送获取响应的实现过程,参见图3所示的方法300的3082的相关内容,在此不再详细说明。
可选地,处理单元1301,还用于向该目标区域保存所述装置1300的CPU运行状态,以使第一节点读取该目标区域保存的CPU运行状态。
可选地,处理单元1301向目标区域保存CPU运行状态的实现过程,参见图3所示的方法300的304的相关内容,在此不再详细说明。
可选地,所述装置1300的CPU包括第二基于执行单元使用的至少一个第一中央处理器CPU寄存器,该CPU运行状态包括至少一个第一CPU寄存器的状态。
可选地,处理单元1301,还用于:
向该目标区域保存第三内存页的地址与第一数据的存储位置之间的映射关系,所述装置1300的内存还包括第三内存页,第一数据存储在所述装置1300的磁盘中,第一数据是第二基本执行单元需要写入第三内存页但还未写入到第三内存页的数据,以使第一节点读取该目标区域保存的该映射关系。
可选地,处理单元1301向该目标区域保存该映射关系的实现过程,参见图3所示的方法300的304的相关内容,在此不再详细说明。
可选地,处理单元1301,还用于:
向该目标区域保存第二基本执行单元在执行时打开的至少一个文件的文件描述符,所述装置1300包括该至少一个文件,以使第一节点读取该目标区域保存的该至少一个文件的文件描述符。
可选地,处理单元1301向该目标区域保存该至少一个文件的文件描述符的实现过程,参见图3所示的方法300的304的相关内容,在此不再详细说明。
可选地,第一基本执行单元和第二基本执行单元是分布式工作流中的两个函数。
在本申请实施例中,由于处理单元获取的第一标识指示第二节点的内存保存的运行数据,发送单元发送第一标识,这样第一节点基于第一标识能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于第一节点基于该存储地址直接读取第二节点的内存保存的运行数据,从而不用将所述装置的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。第一节点直接读取所述装置的内存保存的运行数据的速率 较高,节省了时间开销。
参见图14,本申请实施例提供了一种获取数据的装置1400,所述装置1400部署在图1所示的网络架构100中的调度节点上,或者,部署在所述方法300的调度节点上。所述装置1400包括:
接收单元1401,用于接收第二节点发送的第一标识,第二节点用于执行第二基于执行单元,第二节点的内存用于保存第二基本执行单元的运行数据,第一标识用于指示该运行数据;
发送单元1402,用于向第一节点发送第一标识和第二节点的节点标识,第一节点用于执行第一基本执行单元,第二基本执行单元是第一基本执行单元的前驱依赖单元,第一标识和该节点标识用于触发第一节点获取该运行数据的存储地址,以及基于该存储地址读取第二节点的内存保存的运行数据。
可选地,接收单元1401接收第一标识的实现过程,参见图3所示的方法300的306的相关内容,在此不再详细说明。
可选地,发送单元1402发送第一标识和第二节点的节点标识的实现过程,参见图3所示的方法300的307的相关内容,在此不再详细说明。
可选地,第二节点的内存包括至少一个第一内存页,至少一个第一内存页用于保存该运行数据,第一标识用于指示至少一个第一内存页。
可选地,第一基本执行单元和第二基本执行单元是分布式工作流中两个的函数。
在本申请实施例中,由于第一标识指示第二节点的内存保存的运行数据,发送单元向第一节点发送第一标识和第二节点的节点标识,因此第一节点基于第一标识和第二节点的节点标识,能够获取到该运行数据的存储地址,基于该存储地址直接读取第二节点的内存保存的运行数据。由于第一节点基于该存储地址直接读取第二节点的内存保存的运行数据,从而第二节点不用将第二节点的内存保存的运行数据拷贝到文件系统,省去了内存拷贝的开销。第一节点直接读取第二节点的内存保存的运行数据的速率较高,节省了时间开销。
参见图15,本申请实施例提供了一种设备1500示意图。该设备1500可以是上述任意实施例中的第一节点、第二节点或调度节点。例如该设备1500可以是上述图1所示网络架构100中的调度节点或计算节点,或者,是上述图3所示方法300中的第一节点、第二节点或调度节点。该设备1500包括至少一个处理器1501,内部连接1502,存储器1503以及至少一个收发器1504。
该设备1500是一种硬件结构的装置。
在一些实施例中,可以用于实现图12所述的装置1200中的功能模块。例如,本领域技术人员可以想到图12所示的装置1200中的处理单元1202可以通过该至少一个处理器1501调用存储器1503中的代码来实现。图12所示的装置1200中的接收单元1201和发送单元1203可以通过该至少一个收发器1504来实现。所述设备1500还可以用于实现上述任一实施例中第一节点的功能。
在一些实施例中,可以用于实现图13所述的装置1300中的功能模块。例如,本领域技术人员可以想到图13所示的装置1300中的处理单元1201可以通过该至少一个处理器1501调用存储器1503中的代码来实现。图13所示的装置1200中的发送单元1302和接收单元1303 可以通过该至少一个收发器1504来实现。所述设备1500还可以用于实现上述任一实施例中第二节点的功能。
在一些实施例中,可以用于实现图14所述的装置1400中的功能模块。例如,本领域技术人员可以想到图14所示的装置1400中的接收单元1401和发送单元1402可以通过该至少一个收发器1504来实现。所述设备1500还可以用于实现上述任一实施例中调度节点的功能。
可选的,上述处理器1501可以是一个通用中央处理器(central processing unit,CPU),网络处理器(network processor,NP),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
上述内部连接1502可包括一通路,在上述组件之间传送信息。可选的,内部连接1502为单板或总线等。
上述收发器1504,用于与其他设备或通信网络通信。
上述存储器1503可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1503用于存储执行本申请方案的应用程序代码,并由处理器1501来控制执行。处理器1501用于执行存储器1503中存储的应用程序代码,以及配合至少一个收发器1504,从而使得该设备1500实现本专利方法中的功能。
在具体实现中,作为一种实施例,处理器1501可以包括一个或多个CPU,例如图15中的CPU0和CPU1。
在具体实现中,作为一种实施例,该设备1500可以包括多个处理器,例如图15中的处理器1501和处理器1507。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
参见图16,本申请实施例提供了一种获取数据的系统1600,所述系统1600包括如图12所示的装置1200,如图13所示的装置1300和如图14所示的装置1400。如图12所示的装置可以为第一节点1601,如图13所示的装置1300可以为第二节点1602,如图14所示的装置1400可以为调度节点1603。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (29)

  1. 一种获取数据的方法,其特征在于,所述方法包括:
    第一节点接收第一标识和第二节点的节点标识,所述第一节点用于执行第一基本执行单元,所述第二节点用于执行第二基本执行单元,所述第二基本执行单元是所述第一基本执行单元的前驱依赖单元,所述第二节点的内存用于保存所述第二基本执行单元的运行数据,所述第一标识用于指示所述运行数据;
    所述第一节点基于所述第一标识和所述节点标识获取所述运行数据的存储地址;
    所述第一节点基于所述存储地址,读取所述第二节点的内存保存的运行数据。
  2. 如权利要求1所述的方法,其特征在于,所述第二节点的内存包括至少一个第一内存页,所述至少一个第一内存页用于保存所述运行数据,所述第一标识用于指示所述至少一个第一内存页。
  3. 如权利要求2所述的方法,其特征在于,所述第一节点基于所述第一标识和所述节点标识获取所述运行数据的存储地址,包括:
    所述第一节点基于所述第一标识和所述节点标识,获取所述至少一个第一内存页的地址;
    所述第一节点为所述第一基本执行单元分配至少一个第二内存页,所述第一节点包括所述至少一个第二内存页,所述至少一个第二内存页与所述至少一个第一内存页对应;
    所述第一节点获取所述存储地址,所述存储地址为所述第一基本执行单元首次读取的第二内存页对应的第一内存页的地址。
  4. 如权利要求3所述的方法,其特征在于,所述第二节点包括至少一个第一页表项,所述至少一个第一页表项与所述至少一个第一内存页对应,所述第一节点包括至少一个第二页表项,所述第一节点为所述第一基本执行单元分配至少一个第二内存页,包括:
    所述第一节点为所述第一基本执行单元分配所述至少一个第二页表项,所述至少一个第二页表项与所述至少一个第一页表项对应,所述至少一个第二内存页包括每个第二页表项对应的内存页。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述第二节点的内存包括目标区域,所述目标区域保存有所述第二节点的中央处理器CPU运行状态,所述方法还包括:
    所述第一节点读取所述目标区域保存的所述第二节点的CPU运行状态;
    所述第一节点将所述第一节点的CPU运行状态设置为所述第二节点的CPU运行状态。
  6. 如权利要求5所述的方法,其特征在于,所述目标区域还保存有第三内存页的地址与第一数据的存储位置之间的映射关系,所述第二节点的内存还包括所述第三内存页,所述第一数据存储在所述第二节点的磁盘中,所述第一数据是所述第二基本执行单元需要写入所述第三内存页但还未写入到所述第三内存页的数据,所述第一节点包括与所述第三内存页相对应的第四内存页,所述方法还包括:
    所述第一节点读取所述目标区域保存的所述映射关系;
    所述第一节点在所述第一基本执行单元首次读取所述第四内存页时,基于所述第四内存页对应的所述第三内存页的地址和所述映射关系,获取所述第一数据的存储位置;
    所述第一节点基于所述存储位置和所述节点标识,获取所述第一数据,所述第一数据是所述第一节点执行所述第一基本执行单元使用的数据。
  7. 如权利要求5或6所述的方法,其特征在于,所述目标区域还保存有所述第二基本执行单元在执行时打开的至少一个文件的文件描述符,所述第二节点包括所述至少一个文件,所述方法还包括:
    所述第一节点读取所述目标区域保存的所述至少一个文件的文件描述符,所述至少一个文件的文件描述符是所述第一节点执行所述第一基本执行单元使用的数据。
  8. 一种获取数据的方法,其特征在于,所述方法包括:
    第二节点获取第一标识,所述第二节点用于执行第二基于执行单元,所述第一标识用于指示所述第二节点的内存中保存的所述第二基本执行单元的运行数据;
    所述第二节点发送第一标识,所述第一标识用于触发第一节点获取所述运行数据的存储地址,以及基于所述存储地址读取所述第二节点的内存保存的运行数据,所述第一节点用于执行第一基本执行单元,所述第二基本执行单元是所述第一基本执行单元的前驱依赖单元。
  9. 如权利要求8所述的方法,其特征在于,所述第二节点的内存包括至少一个第一内存页,所述至少一个第一内存页用于保存所述运行数据,所述第一标识用于指示所述至少一个第一内存页。
  10. 如权利要求8或9所述的方法,其特征在于,所述第二节点的内存包括目标区域,所述方法还包括:
    所述第二节点向所述目标区域保存所述第二节点的中央处理器CPU运行状态,以使所述第一节点读取所述目标区域保存的所述CPU运行状态。
  11. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    所述第二节点向所述目标区域保存第三内存页的地址与第一数据的存储位置之间的映射关系,所述第二节点的内存还包括所述第三内存页,所述第一数据存储在所述第二节点的磁盘中,所述第一数据是所述第二基本执行单元需要写入所述第三内存页但还未写入到所述第三内存页的数据,以使所述第一节点读取所述目标区域保存的所述映射关系。
  12. 如权利要求10或11所述的方法,其特征在于,所述方法还包括:
    所述第二节点向所述目标区域保存所述第二基本执行单元在执行时打开的至少一个文件的文件描述符,所述第二节点包括所述至少一个文件,以使所述第一节点读取所述目标区域保存的所述至少一个文件的文件描述符。
  13. 一种获取数据的装置,其特征在于,所述装置包括:
    接收单元,用于接收第一标识和第二节点的节点标识,所述装置用于执行第一基本执行单元,所述第二节点用于执行第二基本执行单元,所述第二基本执行单元是所述第一基本执行单元的前驱依赖单元,所述第二节点的内存用于保存所述第二基本执行单元的运行数据,所述第一标识用于指示所述运行数据;
    处理单元,用于基于所述第一标识和所述节点标识获取所述运行数据的存储地址;
    所述处理单元,还用于基于所述存储地址,读取所述第二节点的内存保存的运行数据。
  14. 如权利要求13所述的装置,其特征在于,所述第二节点的内存包括至少一个第一内存页,所述至少一个第一内存页用于保存所述运行数据,所述第一标识用于指示所述至少一个第一内存页。
  15. 如权利要求14所述的装置,其特征在于,所述处理单元,用于:
    基于所述第一标识和所述节点标识,获取所述至少一个第一内存页的地址;
    为所述第一基本执行单元分配至少一个第二内存页,所述装置包括所述至少一个第二内存页,所述至少一个第二内存页与所述至少一个第一内存页对应;
    获取所述存储地址,所述存储地址为所述第一基本执行单元首次读取的第二内存页对应的第一内存页的地址。
  16. 如权利要求15所述的装置,其特征在于,所述第二节点包括至少一个第一页表项,所述至少一个第一页表项与所述至少一个第一内存页对应,所述装置包括至少一个第二页表项,
    所述处理单元,用于为所述第一基本执行单元分配所述至少一个第二页表项,所述至少一个第二页表项与所述至少一个第一页表项对应,所述至少一个第二内存页包括每个第二页表项对应的内存页。
  17. 如权利要求13-16任一项所述的装置,其特征在于,所述第二节点的内存包括目标区域,所述目标区域保存有所述第二节点的中央处理器CPU运行状态,所述处理单元,还用于:
    读取所述目标区域保存的所述第二节点的CPU运行状态;
    将所述装置的CPU运行状态设置为所述第二节点的CPU运行状态。
  18. 如权利要求17所述的装置,其特征在于,所述目标区域还保存有第三内存页的地址与第一数据的存储位置之间的映射关系,所述第二节点的内存还包括所述第三内存页,所述第一数据存储在所述第二节点的磁盘中,所述第一数据是所述第二基本执行单元需要写入所述第三内存页但还未写入到所述第三内存页的数据,所述装置包括与所述第三内存页相对应的第四内存页,所述处理单元,还用于:
    读取所述目标区域保存的所述映射关系;
    在所述第一基本执行单元首次读取所述第四内存页时,基于所述第四内存页对应的所述第三内存页的地址和所述映射关系,获取所述第一数据的存储位置;
    基于所述存储位置和所述节点标识,获取所述第一数据,所述第一数据是所述装置执行所述第一基本执行单元使用的数据。
  19. 如权利要求17或18所述的装置,其特征在于,所述目标区域还保存有所述第二基本执行单元在执行时打开的至少一个文件的文件描述符,所述第二节点包括所述至少一个文件,所述处理单元,还用于读取所述目标区域保存的所述至少一个文件的文件描述符,所述至少一个文件的文件描述符是所述第一节点执行所述第一基本执行单元使用的数据。
  20. 一种获取数据的装置,其特征在于,所述装置包括:
    处理单元,用于获取第一标识,所述装置用于执行第二基于执行单元,所述第一标识用于指示所述装置的内存中保存的所述第二基本执行单元的运行数据;
    发送单元,用于发送第一标识,所述第一标识用于触发第一节点获取所述运行数据的存储地址,以及基于所述存储地址读取所述装置的内存保存的运行数据,所述第一节点用于执行第一基本执行单元,所述第二基本执行单元是所述第一基本执行单元的前驱依赖单元。
  21. 如权利要求20所述的装置,其特征在于,所述装置的内存包括至少一个第一内存页,所述至少一个第一内存页用于保存所述运行数据,所述第一标识用于指示所述至少一个第一内存页。
  22. 如权利要求20或21所述的装置,其特征在于,所述装置的内存包括目标区域,所述处理单元,还用于向所述目标区域保存所述装置的中央处理器CPU运行状态,以使所述第一节点读取所述目标区域保存的所述CPU运行状态。
  23. 如权利要求22所述的装置,其特征在于,所述处理单元,还用于:
    向所述目标区域保存第三内存页的地址与第一数据的存储位置之间的映射关系,所述装置的内存还包括所述第三内存页,所述第一数据存储在所述装置的磁盘中,所述第一数据是所述第二基本执行单元需要写入所述第三内存页但还未写入到所述第三内存页的数据,以使所述第一节点读取所述目标区域保存的所述映射关系。
  24. 如权利要求22或23所述的装置,其特征在于,所述处理单元,还用于:
    向所述目标区域保存所述第二基本执行单元在执行时打开的至少一个文件的文件描述符,所述装置包括所述至少一个文件,以使所述第一节点读取所述目标区域保存的所述至少一个文件的文件描述符。
  25. 一种第一节点,其特征在于,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求1-7任一项所述的方法。
  26. 一种第二节点,其特征在于,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求8-12任一项所述的方法。
  27. 一种获取数据的系统,其特征在于,所述系统包括如权利要求13-19任一项所述的装置和如权利要求20-24任一项所述的装置,或者,所述系统包括如权利要求25所述的第一节点和如权利要求26所述的第二节点。
  28. 一种计算机存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1-12任一项所述的方法。
  29. 一种计算机程序产品,其包括计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1-12任一项所述的方法。
PCT/CN2022/120986 2022-09-23 2022-09-23 获取数据的方法、装置、系统及存储介质 WO2024060228A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/120986 WO2024060228A1 (zh) 2022-09-23 2022-09-23 获取数据的方法、装置、系统及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/120986 WO2024060228A1 (zh) 2022-09-23 2022-09-23 获取数据的方法、装置、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2024060228A1 true WO2024060228A1 (zh) 2024-03-28

Family

ID=90453771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120986 WO2024060228A1 (zh) 2022-09-23 2022-09-23 获取数据的方法、装置、系统及存储介质

Country Status (1)

Country Link
WO (1) WO2024060228A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9569459B1 (en) * 2014-03-31 2017-02-14 Amazon Technologies, Inc. Conditional writes at distributed storage services
CN113672411A (zh) * 2021-08-25 2021-11-19 烽火通信科技股份有限公司 一种网络设备虚拟化驱动适配层的实现方法和装置
CN113760560A (zh) * 2020-06-05 2021-12-07 华为技术有限公司 一种进程间通信方法以及进程间通信装置
CN114064302A (zh) * 2020-07-30 2022-02-18 华为技术有限公司 一种进程间通信的方法及装置
CN114518969A (zh) * 2022-02-18 2022-05-20 杭州朗和科技有限公司 进程间通信方法、系统、存储介质和计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9569459B1 (en) * 2014-03-31 2017-02-14 Amazon Technologies, Inc. Conditional writes at distributed storage services
CN113760560A (zh) * 2020-06-05 2021-12-07 华为技术有限公司 一种进程间通信方法以及进程间通信装置
CN114064302A (zh) * 2020-07-30 2022-02-18 华为技术有限公司 一种进程间通信的方法及装置
CN113672411A (zh) * 2021-08-25 2021-11-19 烽火通信科技股份有限公司 一种网络设备虚拟化驱动适配层的实现方法和装置
CN114518969A (zh) * 2022-02-18 2022-05-20 杭州朗和科技有限公司 进程间通信方法、系统、存储介质和计算机设备

Similar Documents

Publication Publication Date Title
US9996401B2 (en) Task processing method and virtual machine
US7614053B2 (en) Methods and apparatus for task management in a multi-processor system
US8028292B2 (en) Processor task migration over a network in a multi-processor system
US7565653B2 (en) Methods and apparatus for processor task migration in a multi-processor system
US7617376B2 (en) Method and apparatus for accessing a memory
WO2018035856A1 (zh) 实现硬件加速处理的方法、设备和系统
CN111309649B (zh) 一种数据传输和任务处理方法、装置及设备
US20040252709A1 (en) System having a plurality of threads being allocatable to a send or receive queue
US8738890B2 (en) Coupled symbiotic operating system
US20110202918A1 (en) Virtualization apparatus for providing a transactional input/output interface
CN115185880B (zh) 一种数据存储方法及装置
WO2024041576A1 (zh) 一种虚拟机的热迁移方法、设备、系统及存储介质
JP4183712B2 (ja) マルチプロセッサシステムにおいてプロセッサタスクを移動するデータ処理方法、システムおよび装置
US20180024865A1 (en) Parallel processing apparatus and node-to-node communication method
WO2024060228A1 (zh) 获取数据的方法、装置、系统及存储介质
US20100169271A1 (en) File sharing method, computer system, and job scheduler
US10824640B1 (en) Framework for scheduling concurrent replication cycles
JP5163128B2 (ja) 共有メモリ型マルチプロセッサにおける手続の呼び出し方法、手続の呼び出しプログラム、記録媒体、およびマルチプロセッサ
JP2008276322A (ja) 情報処理装置、情報処理システムおよび情報処理方法
CN113439260A (zh) 针对低时延存储设备的i/o完成轮询
JP2003316589A (ja) 実記憶利用方法
CN116383127B (zh) 节点间通信方法、装置、电子设备及存储介质
WO2022193108A1 (zh) 一种集成芯片及数据搬运方法
WO2023231768A1 (zh) 一种多核处理器及相关核间通信方法
CN117908772A (zh) 多mb的数据处理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22959233

Country of ref document: EP

Kind code of ref document: A1