CN115237605B - Data transmission method between CPU and GPU and computer equipment - Google Patents

Data transmission method between CPU and GPU and computer equipment Download PDF

Info

Publication number
CN115237605B
CN115237605B CN202211134216.8A CN202211134216A CN115237605B CN 115237605 B CN115237605 B CN 115237605B CN 202211134216 A CN202211134216 A CN 202211134216A CN 115237605 B CN115237605 B CN 115237605B
Authority
CN
China
Prior art keywords
data
data set
gpu
attribute
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211134216.8A
Other languages
Chinese (zh)
Other versions
CN115237605A (en
Inventor
章毅
祝生乾
胡俊杰
余程嵘
段兆航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202211134216.8A priority Critical patent/CN115237605B/en
Publication of CN115237605A publication Critical patent/CN115237605A/en
Application granted granted Critical
Publication of CN115237605B publication Critical patent/CN115237605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data processing, and discloses a data transmission method between a CPU and a GPU and computer equipment, wherein the method comprises the following steps: acquiring a first data set required to be transmitted by a CPU, wherein the first data set comprises a plurality of class data with the same class name; performing attribute merging on class data in the first data set to obtain a second data set; based on the attribute value arrangement sequence in the second data set, sequentially establishing an address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU; and transmitting the first data set to the GPU for storage based on the address mapping relation. The method and the device solve the problem that in the data transmission process of the existing CPU and the GPU, the data storage mode after transmission is unchanged, so that the data reading efficiency of the GPU is low.

Description

Data transmission method between CPU and GPU and computer equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method for transmitting data between a CPU and a GPU and a computer device.
Background
For a complex neural network, the calculation by using a Central Processing Unit (CPU) is not efficient, and because the neural network is highly parallel, the calculation efficiency of the neural network can be effectively improved by adopting a Graphics Processing Unit (GPU) suitable for parallel calculation to process parallel calculation tasks. With the continuous development of artificial intelligence, the hardware requirements of the GPU which is adept at large-scale parallel operation are higher and higher, and the GPU still needs to complete the calculation task under the instruction control of the CPU in the normal work flow, so that data transmission is often performed between the CPU and the GPU.
In addition, the object-oriented programming is a mainstream programming design method, which has the advantages of high readability, easy expansion, convenient modeling and the like, programmers often use the object-oriented programming method to design a GPU program, and the data needs to be transmitted from the CPU to the GPU so as to realize large-scale parallel computing by using the parallel computing capability of the GPU. Specifically, a typical GPU program implementation flow is as follows: firstly, allocating memory space for data on a GPU, then calculating the mapping relation between the memory address of a data CPU and the allocated address of the GPU, copying the data on the memory of the CPU to the memory of the GPU, submitting access transaction acquisition data to the memory of the GPU by a thread bundle in each calculation unit of the GPU, calculating, and then transmitting the result back to the memory of the CPU from the memory of the GPU.
In the data transmission process of the existing CPU and GPU, the storage modes of the data in the CPU and the GPU are unchanged, namely the storage structure of the data in the GPU is the same as that in the CPU, and the data are transmitted and stored in the form of class data. However, due to the difference between the access modes of the CPU and the GPU for the memory data, the existing data transmission and storage mode is not favorable for the GPU to read the data, and at the same time, the memory bandwidth of the GPU is greatly limited, which also causes a great deal of waste for the GPU cache.
Disclosure of Invention
Based on the technical problems, the application provides a data transmission method between a CPU and a GPU and computer equipment, and solves the problem that the data reading efficiency of the GPU is low due to the fact that the data storage mode after transmission is unchanged in the data transmission process of the existing CPU and the GPU.
In order to solve the technical problems, the technical scheme adopted by the application is as follows:
a method for data transmission between a CPU and a GPU comprises the following steps:
acquiring a first data set required to be transmitted by a CPU, wherein the first data set comprises a plurality of class data with the same class name;
carrying out attribute combination on class data in the first data set to obtain a second data set;
based on the attribute value arrangement sequence in the second data set, sequentially establishing an address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU;
and transmitting the first data set to the GPU for storage based on the address mapping relation.
Further, attribute merging is performed on the class data in the first data set, and obtaining a second data set includes:
acquiring an attribute list of the class data, wherein the attribute list comprises a plurality of attribute names of the class data;
sequentially extracting attribute names in the attribute list, and sequentially extracting attribute values of corresponding attribute names in the first data set based on the attribute names;
and sequencing the extracted attribute values to obtain a second data set.
Furthermore, the GPU stores the attribute values in the first data set in the storage space of the GPU in sequence according to the order of the attribute values in the second data set.
Further, before sequentially establishing an address mapping relationship between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU based on the attribute value arrangement order in the second data set, the method further includes:
calculating the size of a memory required to be occupied by the first data set;
based on the memory size, the GPU allocates memory space for storing the first set of data.
Further, an address mapping relation is stored, and the address mapping relation is based on the class name of the class data in the first data set and serves as an index.
Further, after acquiring the first data set that the CPU needs to transmit, the method further includes:
acquiring a class name of class data in a first data set;
searching whether the class name has a corresponding stored address mapping relation;
and if the class name has a corresponding stored address mapping relation, transmitting the first data set to the GPU for storage based on the stored address mapping relation.
Further, if the class name does not have the corresponding stored address mapping relationship, the method goes to a step of performing attribute merging on the class data in the first data set to obtain a second data set.
A computer device comprising a CPU, a GPU and an address management module, the address management module comprising:
the data reading unit is used for acquiring a first data set required to be transmitted by the CPU, and the first data set comprises a plurality of class data with the same class name;
the attribute merging unit is used for performing attribute merging on the class data in the first data set to obtain a second data set;
the address mapping unit is used for sequentially establishing the address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU based on the attribute value arrangement sequence in the second data set;
and the data transmission unit is used for transmitting the first data set to the GPU for storage based on the address mapping relation.
Compared with the prior art, the beneficial effects of this application are:
according to the method and the device, the storage structure of the object-oriented programming program data in the GPU is synchronously changed by changing the structure of the data transmitted from the CPU to the GPU, so that GPU memory access transactions are reduced, GPU memory access efficiency and memory bandwidth are improved, meanwhile, the waste of L2 cache in the GPU is reduced, and the cache utilization rate of L2 is improved.
In addition, the address mapping relation of the data transmission between the CPU and the CPU can be stored, the memory address mapping relation of the data between the CPU and the GPU is recorded after the CPU transmits the data to the GPU for the first time, and when the same data is encountered in the follow-up process, the step of calculating the address mapping relation is omitted for the follow-up data transmission process, CPU calculation resources are saved, and the data transmission efficiency is accelerated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. Wherein:
fig. 1 is a schematic flow chart of a data transmission method between a CPU and a GPU.
Fig. 2 is a schematic flow chart illustrating a process of performing attribute merging on class data in a first data set to obtain a second data set.
Fig. 3 is a schematic flowchart of allocating memory space by the GPU.
FIG. 4 is a flowchart illustrating a process of searching whether data has a corresponding stored address mapping relationship.
Fig. 5 is a diagram of the existing hardware architecture of the CPU and GPU.
Fig. 6 is a schematic diagram of a storage structure of data on a conventional GPU.
Fig. 7 is a schematic diagram illustrating reading of data on a conventional GPU.
Fig. 8 is a schematic diagram of a storage structure of data on a GPU according to the present application.
Fig. 9 is a schematic diagram illustrating reading of data on a GPU according to the present application.
Fig. 10 is a block diagram schematically illustrating the structure of the computer device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
It should be understood that "system", "device", "unit" and/or "module" as used in this specification is a method for distinguishing different components, elements, parts or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to or removed from these processes.
Referring to fig. 1, in some embodiments, a method for data transmission between a CPU and a GPU includes:
s101, acquiring a first data set required to be transmitted by a CPU, wherein the first data set comprises a plurality of class data with the same class name;
s102, performing attribute merging on class data in the first data set to obtain a second data set;
s103, sequentially establishing address mapping relations between memory addresses of corresponding attribute values in the first data set in the CPU and GPU memory addresses based on the attribute value arrangement sequence in the second data set;
and S104, transmitting the first data set to the GPU for storage based on the address mapping relation.
In this embodiment, it is known in the prior art that the general flow of the GPU program of the graphics processor includes four steps:
1. memory address calculation: firstly, allocating a memory space at a GPU, and determining a mapping relation between a CPU memory address and a GPU memory address;
2. data transmission: the CPU transmits data to the GPU through the transmission bus according to the address mapping relation;
3. data acquisition: the GPU computing core accesses a GPU memory by utilizing a thread bundle to acquire data;
4. calculating and returning a result: and the GPU calculation core calculates by using the data acquired by the thread bundle to obtain a result and transmits the result back to the CPU.
Referring to FIG. 5, it can be seen from the prior art hardware architecture diagram of the CPU and GPU that the CPU transfers data to the GPU, essentially by transferring data directly from the CPU memory to the GPU memory through a transfer bus (note that data is transferred from the CPU memory to the GPU memory, which is referred to herein as the global memory of the GPU because there are many types of memory in the GPU.
Firstly, a CPU sends a memory allocation instruction to a GPU to allocate memory for data to be transmitted, and then the mapping relation between the memory address of the data in the CPU and the memory address of the GPU is calculated, wherein the mapping relation can be used in the next data transmission. And the CPU starts to transmit data to the GPU through the data transmission bus by using the address mapping relation calculated in the previous step, and stores the data into a GPU memory.
After the data is transmitted from the CPU memory to the GPU memory, the GPU computational core may request access to the global memory to obtain the data for the next stage of computation. As is known, the large-scale computation capability of a GPU is realized by a plurality of hardware devices SM, where SMs are integrated units of data processing in the GPU, and one SM can support hundreds of threads to execute concurrently.
And the thread bundle is the most basic execution unit of the SM, and each thread bundle is composed of 32 continuous threads. All threads in the same Thread bundle are executed in a Single Instruction Multiple Thread (SIMT) manner, that is, all threads in the same Thread bundle need to execute the same Instruction at the same time, and each Thread performs calculation on its own private class data separately.
For a memory access (data to access GPU memory) instruction, each thread in the bundle has its own memory address, and each thread submits a memory access transaction with a memory address. The global memory is the memory with the slowest access speed and the largest capacity in the GPU, and any SM device can access the global memory.
In a GPU program designed by a current common object-oriented programming method, data having the same attribute is often defined as a class, a name of the class is defined as a class name, and each class has many attributes, such as defining an image class, the class name is image, and the image has attributes of length, width, and height.
Assuming that there is a class with a class name S, where S has 4 attributes a, b, c, and d (each attribute occupies a byte of storage space), and a batch of data consists of 32S, according to the existing data transmission method between the CPU and the GPU, the storage structure of the batch of data on the GPU hardware device is shown in fig. 6. In fig. 6, the memory address is in bytes as a basic unit, each cell is 1 byte, each S occupies 4 bytes, and 32S occupies 128 bytes.
In combination with the above-mentioned GPU memory access principle, the basic unit of the GPU accessing the memory is a thread bundle (that is, when a certain thread in the thread bundle needs to access the memory, other threads in the thread bundle also need to access the memory at the same time, no matter whether they need to access the memory or not), and each time the memory is accessed, it is called a memory access transaction. Assuming there are 32 threads in the thread bundle, each thread reading 1 byte, then a memory access transaction requires access to 32 bytes.
As shown in fig. 7, assuming that a thread bundle needs to obtain the attribute a of all data for calculation, the thread bundle can only access 32 bytes of the GPU memory at a time, and the attribute a of the data is distributed in the memory with addresses of 0 to 128, which means that the thread bundle needs to access 4 times of the memory to obtain the complete data.
In this embodiment, when the CPU transmits data to the GPU, the storage manner of the data is changed by an attribute merging manner, and still taking a data with a class name of S as an example, the improved storage structure is shown in fig. 8, and the attributes of 32S are respectively merged according to attributes a, b, c, and d. Referring to fig. 9, for a new storage structure, the thread bundle needs only 1 memory access transaction to obtain complete data. The storage structure before improvement uses 4 times of memory access affairs, and only 1 time of memory access affairs after improvement, thereby saving GPU memory access resources and accelerating the whole calculation process.
In addition, the GPU also has a caching mechanism. An L2 cache (a second level cache) of the GPU is a storage device with a faster access speed and a smaller capacity than a global memory, and when an SM unit of the GPU needs to read data, the needed data is first searched in the L2 cache, and if the SM unit is found to be directly read, the SM unit is not found, the SM unit needs to be obtained from the global memory. The L2 cache works according to the spatial locality principle, and when the SM unit accesses a certain data in the global memory, the data and its neighboring data are loaded into the L2 cache. I.e. a certain data in the memory is requested, data adjacent to this data may also be requested.
Because the speed of acquiring data from the L2 cache is far higher than the speed of acquiring data from the memory, the GPU accesses the L2 cache first when acquiring data, and immediately acquires target data if the cache exists, accesses the memory if the cache does not have the target data, and puts the accessed data into the cache after accessing the memory so as to be possibly used next time.
Therefore, by using the data stored in the GPU memory by the existing data transmission method of the CPU and the GPU, for the memory access transaction of the thread bundle, each time the memory access transaction accesses the data which is not needed, such as the attributes b, c, and d, in the L2 cache, the attributes b, c, and d are loaded into the L2 cache along with the attribute a, and the useless data occupy a large amount of the L2 cache, which causes a large amount of cache waste and causes low cache utilization rate.
The data transmission method of the embodiment enables the data stored in the GPU memory to be stored according to the structure after attribute merging, and the improved data storage structure does not need to load unused b, c, d attributes into the L2 cache, thereby greatly reducing the cache waste of the L2 and improving the cache utilization rate.
In summary, the data transmission method between the CPU and the GPU of this embodiment performs attribute merging on a batch of data belonging to the same class, changes the storage structure of the data in the GPU, improves the global memory bandwidth of the GPU, reduces the L2 cache waste of the GPU, and improves the cache utilization rate.
Referring to fig. 2, preferably, the attribute merging of the class data in the first data set to obtain the second data set includes:
s201, acquiring an attribute list of class data, wherein the attribute list comprises a plurality of attribute names of the class data;
s202, sequentially extracting attribute names in the attribute list, and sequentially extracting attribute values of corresponding attribute names in the first data set based on the attribute names;
s203, arranging the extracted attribute values in sequence to obtain a second data set.
Preferably, the GPU stores the attribute values in the first data set in the storage space of the GPU in sequence according to the order of the attribute values in the second data set.
Referring to fig. 3, preferably, before sequentially establishing an address mapping relationship between a memory address of the corresponding attribute value in the first data set in the CPU and a memory address of the GPU based on the attribute value arrangement order in the second data set, the method further includes:
s301, calculating the size of a memory occupied by the first data set;
s302, based on the memory size, the GPU allocates a memory space for storing the first data set.
In some embodiments, an address mapping relationship is stored, the address mapping relationship being indexed based on a class name of class data in the first data set.
Referring to fig. 4, preferably, after acquiring the first data set that the CPU needs to transmit, the method further includes:
s401, acquiring a class name of class data in a first data set;
s402, searching whether the class name has a corresponding stored address mapping relation;
and S403, if the class name has the corresponding stored address mapping relation, transmitting the first data set to the GPU for storage based on the stored address mapping relation.
S404, if the class name does not have the corresponding stored address mapping relation, the step of carrying out attribute combination on the class data in the first data set to obtain a second data set is carried out.
In this embodiment, in the process of first transmitting data from the CPU to the GPU, the address mapping relationship between the data CPU memory and the GPU memory is retained, so that when the same type of data is transmitted later, the address mapping relationship may not be calculated any more, data transmission may be performed directly according to the existing address mapping relationship, the address mapping relationship between the data CPU memory and the GPU memory does not need to be recalculated, and by recording the mapping relationship between the data CPU memory and the GPU memory address, the address mapping relationship is prevented from being recalculated every time the CPU transmits data to the GPU, and computing resources are saved.
In addition, because the address mapping relation is clear, data can be transmitted in parallel by using multiple processes, and the data transmission speed is accelerated. And starting calculation after all threads acquire corresponding data, and transmitting the result back to the CPU by the GPU after the calculation is finished.
Referring to fig. 10, in some embodiments, there is also disclosed a computer device comprising a CPU, a GPU and an address management module, the address management module comprising:
the data reading unit is used for acquiring a first data set required to be transmitted by the CPU, and the first data set comprises a plurality of class data with the same class name;
the attribute merging unit is used for performing attribute merging on the class data in the first data set to obtain a second data set;
the address mapping unit is used for sequentially establishing the address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU based on the attribute value arrangement sequence in the second data set;
and the data transmission unit is used for transmitting the first data set to the GPU for storage based on the address mapping relation.
In this embodiment, the address management module is used to change the data structure when the CPU transmits data to the GPU, so that the storage structure of the data transmitted to the GPU is also changed, thereby greatly improving the memory bandwidth of the GPU and reducing the waste of the GPU cache.
The above is an embodiment of the present application. The embodiments and specific parameters in the embodiments are only used for clearly illustrating the verification process of the application and are not used for limiting the patent protection scope of the application, which is defined by the claims, and all the equivalent structural changes made by using the contents of the specification and the drawings of the application should be included in the protection scope of the application.

Claims (7)

  1. A method for data transmission between a CPU and a GPU, comprising:
    acquiring a first data set required to be transmitted by a CPU, wherein the first data set comprises a plurality of class data with the same class name;
    performing attribute merging on class data in the first data set to obtain a second data set;
    based on the attribute value arrangement sequence in the second data set, sequentially establishing an address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU;
    transmitting the first data set to the GPU for storage based on the address mapping relation;
    performing attribute merging on the class data in the first data set to obtain a second data set includes:
    acquiring an attribute list of the class data, wherein the attribute list comprises a plurality of attribute names of the class data;
    sequentially extracting attribute names in the attribute list, and sequentially extracting attribute values of corresponding attribute names in the first data set based on the attribute names;
    and arranging the extracted attribute values in sequence to obtain the second data set.
  2. 2. The method of claim 1, wherein the method comprises:
    and the GPU stores the attribute values in the first data set in a storage space of a GPU memory in sequence according to the attribute value arrangement sequence in the second data set.
  3. 3. The method according to claim 1, before sequentially establishing an address mapping relationship between a memory address of the corresponding attribute value in the first data set in the CPU and a memory address of the GPU based on the sequence of the attribute values in the second data set, the method further comprises:
    calculating the size of a memory required to be occupied by the first data set;
    based on the memory size, the GPU allocates memory space for storing the first set of data.
  4. 4. The method of claim 1, wherein the method further comprises:
    and storing the address mapping relation, wherein the address mapping relation is based on the class name of the class data in the first data set as an index.
  5. 5. The method according to claim 4, wherein after acquiring the first data set that the CPU needs to transmit, the method further comprises:
    acquiring a class name of class data in the first data set;
    searching whether the class name has a corresponding stored address mapping relation;
    and if the class name has a corresponding stored address mapping relation, transmitting the first data set to the GPU for storage based on the stored address mapping relation.
  6. 6. The method of claim 5, wherein the method comprises:
    and if the class name does not have the corresponding stored address mapping relation, performing attribute merging on the class data in the first data set to obtain a second data set.
  7. 7. A computer device comprising a CPU, a GPU and an address management module, the address management module comprising:
    the data reading unit is used for acquiring a first data set required to be transmitted by the CPU, and the first data set comprises a plurality of class data with the same class name;
    the attribute merging unit is used for performing attribute merging on the class data in the first data set to obtain a second data set;
    the address mapping unit is used for sequentially establishing the address mapping relation between the memory address of the corresponding attribute value in the first data set in the CPU and the memory address of the GPU on the basis of the attribute value arrangement sequence in the second data set;
    a data transmission unit, configured to transmit the first data set to the GPU for storage based on the address mapping relationship;
    performing attribute merging on the class data in the first data set to obtain a second data set includes:
    acquiring an attribute list of the class data, wherein the attribute list comprises a plurality of attribute names of the class data;
    sequentially extracting attribute names in the attribute list, and sequentially extracting attribute values of corresponding attribute names in the first data set based on the attribute names;
    and arranging the extracted attribute values in sequence to obtain the second data set.
CN202211134216.8A 2022-09-19 2022-09-19 Data transmission method between CPU and GPU and computer equipment Active CN115237605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211134216.8A CN115237605B (en) 2022-09-19 2022-09-19 Data transmission method between CPU and GPU and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211134216.8A CN115237605B (en) 2022-09-19 2022-09-19 Data transmission method between CPU and GPU and computer equipment

Publications (2)

Publication Number Publication Date
CN115237605A CN115237605A (en) 2022-10-25
CN115237605B true CN115237605B (en) 2023-03-28

Family

ID=83681551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211134216.8A Active CN115237605B (en) 2022-09-19 2022-09-19 Data transmission method between CPU and GPU and computer equipment

Country Status (1)

Country Link
CN (1) CN115237605B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8473900B2 (en) * 2009-07-01 2013-06-25 Advanced Micro Devices, Inc. Combining classes referenced by immutable classes into a single synthetic class
US20140129806A1 (en) * 2012-11-08 2014-05-08 Advanced Micro Devices, Inc. Load/store picker
CN103019949B (en) * 2012-12-27 2015-08-19 华为技术有限公司 A kind of distribution method and device writing merging Attribute Memory space
CN104156268B (en) * 2014-07-08 2017-07-07 四川大学 The load distribution of MapReduce and thread structure optimization method on a kind of GPU
CN109919166B (en) * 2017-12-12 2021-04-09 杭州海康威视数字技术股份有限公司 Method and device for acquiring classification information of attributes
CN109902059B (en) * 2019-02-28 2021-06-29 苏州浪潮智能科技有限公司 Data transmission method between CPU and GPU
CN109992385B (en) * 2019-03-19 2021-05-14 四川大学 GPU internal energy consumption optimization method based on task balance scheduling
CN114546491A (en) * 2021-11-04 2022-05-27 北京壁仞科技开发有限公司 Data operation method, data operation device and data processor
CN114265849B (en) * 2022-02-28 2022-06-10 杭州广立微电子股份有限公司 Data aggregation method and system

Also Published As

Publication number Publication date
CN115237605A (en) 2022-10-25

Similar Documents

Publication Publication Date Title
Li et al. NUMA-aware shared-memory collective communication for MPI
US6816947B1 (en) System and method for memory arbitration
US6785888B1 (en) Memory allocator for a multiprocessor computer system
EP3639144B1 (en) Memory management in non-volatile memory
US8533432B2 (en) Cache and/or socket sensitive multi-processor cores breadth-first traversal
US9086920B2 (en) Device for managing data buffers in a memory space divided into a plurality of memory elements
US8495302B2 (en) Selecting a target number of pages for allocation to a partition
US20140115291A1 (en) Numa optimization for garbage collection of multi-threaded applications
WO2024036985A1 (en) Storage system, computational storage processor and solid-state drive thereof, and data reading method and data writing method therefor
US11928061B2 (en) Cache management method and apparatus
EP4209914A1 (en) Reconfigurable cache architecture and methods for cache coherency
CN115237605B (en) Data transmission method between CPU and GPU and computer equipment
US9405470B2 (en) Data processing system and data processing method
CN116225693A (en) Metadata management method, device, computer equipment and storage medium
CN115203210A (en) Hash table processing method, device and equipment and computer readable storage medium
JPH02162439A (en) Free list control system for shared memory
KR20220142059A (en) In-memory Decoding Cache and Its Management Scheme for Accelerating Deep Learning Batching Process
US7073004B2 (en) Method and data processing system for microprocessor communication in a cluster-based multi-processor network
WO2023241655A1 (en) Data processing method, apparatus, electronic device, and computer-readable storage medium
Aude et al. The MULTIPLUS/MULPLIX parallel processing environment
CN116701068A (en) Memory allocation method and device, electronic equipment and storage medium
US20130318534A1 (en) Method and system for leveraging performance of resource aggressive applications
JPH0830512A (en) Thread control system
CN116680296A (en) Large-scale graph data processing system based on single machine
CN115454681A (en) Batch processing program execution method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant