CN117742954A - Data processing apparatus and data storage method for large language model - Google Patents

Data processing apparatus and data storage method for large language model Download PDF

Info

Publication number
CN117742954A
CN117742954A CN202311759179.4A CN202311759179A CN117742954A CN 117742954 A CN117742954 A CN 117742954A CN 202311759179 A CN202311759179 A CN 202311759179A CN 117742954 A CN117742954 A CN 117742954A
Authority
CN
China
Prior art keywords
memory
data
memory data
stored
memories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311759179.4A
Other languages
Chinese (zh)
Inventor
武正辉
刘月吉
何永占
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311759179.4A priority Critical patent/CN117742954A/en
Publication of CN117742954A publication Critical patent/CN117742954A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Generation (AREA)

Abstract

The present disclosure provides a data processing device and a data storage method for a large language model, and relates to the technical field of computers and artificial intelligence, in particular to the technical field of data storage and large language models. The data processing apparatus includes: a plurality of graphics processing units; the storage unit is electrically connected with the control unit and comprises a plurality of memories; and a control unit electrically connected to the plurality of graphic processing units and the storage unit, the control unit being configured to sequentially allocate the plurality of memories to the plurality of graphic processing units according to a data state of the data to be stored, so as to store the video memory data from the plurality of graphic processing units using the plurality of storage media.

Description

Data processing apparatus and data storage method for large language model
Technical Field
The present disclosure relates to the field of computer technology and artificial intelligence, and in particular, to the field of data storage and large language model technology, and more particularly, to a data processing apparatus and a data storage method for a large language model.
Background
As the user's demands for data processing accuracy of artificial intelligence models increase, the number of parameters of artificial intelligence models is increasing. Data processing devices deployed during artificial intelligence model training and reasoning can generate large amounts of memory data and video memory data, and thus, there is a need for a device that can meet the ever-increasing data storage requirements.
Disclosure of Invention
The present disclosure provides a data processing apparatus and a data storage method for a large language model.
According to an aspect of the present disclosure, there is provided a data processing apparatus including: a plurality of graphics processing units; the storage unit is electrically connected with the control unit and comprises a plurality of memories; and a control unit electrically connected to the plurality of graphic processing units and the storage unit, the control unit being configured to sequentially allocate the plurality of memories to the plurality of graphic processing units according to a data state of the data to be stored, so as to store the video memory data from the plurality of graphic processing units using the plurality of storage media.
According to another aspect of the present disclosure, there is provided a data storage method including: in response to received video memory data to be stored from the graphics processing unit, determining a third target memory for the video memory data to be stored from a plurality of memories in the storage unit according to the identification of the video memory data to be stored; and storing the to-be-stored display data to a third target memory.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a schematic diagram of a data storage method of storing display data according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a schematic diagram of a data storage method for storing display data according to another embodiment of the present disclosure;
fig. 6 schematically illustrates a schematic diagram of a data storage method for storing both memory data and video memory data according to an embodiment of the present disclosure;
fig. 7 schematically illustrates a schematic diagram of a data storage method for storing both memory data and video memory data according to another embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
With the increasing number of parameters of artificial intelligence models, the storage requirements of high performance computer clusters (High Performance Computing, HPC) for supporting the artificial intelligence models for data processing are also increasing. However, the number of memories in the related example is limited, and if the memory space is increased by increasing the number of memories, since the memories and the CPU (Central Processing Unit ) are usually integrated on the same PCB (Printed Circuit Board ) board, there is a problem that the communication line between the memories and the CPU is excessively long, affecting data transmission, resulting in instability of the system
Meanwhile, in the training and/or reasoning process of the artificial intelligent model, a large amount of display memory data from the GPU (Graphics Processing Unit, graphics processor) is generated, and the processing capacity of the data processing equipment for training and/or reasoning scenes of the artificial intelligent model is bottleneck due to the lack of a technical scheme for effectively expanding the memory space of the display memory data.
In view of this, the embodiments of the present disclosure provide a data processing apparatus, which is electrically connected to a plurality of graphics processing units and a storage unit through a control unit, where the control unit is configured to sequentially allocate a plurality of memories to the plurality of graphics processing units according to a data state of data to be stored, so as to achieve a technical effect of storing video memory data from the plurality of graphics processing units by using a plurality of storage media, and at least partially solve a storage space that is difficult to expand by a related example, so as to satisfy a storage requirement for a large amount of video memory data in a training and reasoning scenario of an artificial intelligence model.
Fig. 1 schematically shows a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 1, the apparatus 100 may include a plurality of graphic processing units "GPU0"130_0 to "GPUn"130_n, a storage unit 110, and a control unit 120. The memory unit 110 includes a plurality of memories, which are "memory 0"110_0, "memory 1"110_1, "memory m"110_m, respectively.
According to embodiments of the present disclosure, the plurality of graphics processing units "GPU0"130_0 to "GPUn"130_n may be used to process image data in a plurality of convolutional layers of a convolutional neural network model. The plurality of graphics processing units may correspond to a plurality of convolutional layers, each graphics processing unit for processing image data in a respective convolutional layer. Each convolution layer may also be provided with a plurality of graphics processing units for processing image data in the convolution layer.
According to an embodiment of the present disclosure, the control unit 120 may be a "Cxl (Compute Express Link, computer interconnect) switch. Cx1 is a protocol that is open for high bandwidth low latency device interconnection.
According to an embodiment of the disclosure, the control unit 120 may be electrically connected to the plurality of graphics processing units and the storage unit 110 based on a PCIe (Peripheral Component Interconnect express, high-speed serial computer expansion bus standard) bus, respectively, so as to transmit the video memory data of the plurality of graphics processing units "GPU0"130_0 to "GPUn"130_n to the memory in the storage unit 110 for storage based on Cxl protocol, or read the video memory data from the memory in the storage unit 110 based on Cxl protocol, so that the plurality of graphics processing units "GPU0"130_0 to "GPUn"130_n process the video memory data.
According to an embodiment of the present disclosure, the control unit 120 may sequentially allocate a plurality of memories to a plurality of graphic processing units according to a state of data to be stored to store video memory data from the plurality of graphic processing units using a plurality of storage media.
According to embodiments of the present disclosure, the state of the data to be stored may include a source state of the data to be stored, such as: when Data to be stored 1 From the "GPU0"130_0, the "memory 0"110_0 in the storage unit 110 can be determined as a memory for storing the Data to be stored Data by configuring the correspondence relationship between the memory and the graphics processing unit 1
According to the embodiment of the disclosure, the control unit is electrically connected with the plurality of graphic processing units and the storage unit, and is configured to sequentially allocate the plurality of memories to the plurality of graphic processing units according to the data state of the data to be stored, so that the technical effect of storing the video memory data from the plurality of graphic processing units by utilizing the plurality of storage media is realized, the storage space which is difficult to expand by related examples is at least partially solved, and the storage requirement of a large amount of video memory data in training and reasoning scenes of the artificial intelligent model is met.
It should be understood that the number of memories, GPUs, and control units in fig. 1 are merely illustrative. There may be any number of memories, GPUs, and control units, as desired for an implementation.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
According to the embodiment of the disclosure, based on the Cxl protocol, not only can the storage of a large amount of video memory data be realized, but also the storage of a large amount of memory data can be realized.
Fig. 2 schematically shows a schematic view of a data processing device according to another embodiment of the present disclosure.
As shown in fig. 2, a cpu_0 2101 and a cpu_1 2102 may be included in this embodiment 200 that are electrically connected via a PCIe bus.
CPU_0 2101 and Cxl_0_2201 are electrically connected through a PCIe bus, and Cxl_0_2201 is electrically connected with first memory cell 2301, GPU0 2400 and GPU1 2401 respectively through the PCIe bus. The first memory cell 2301 includes m+1 memories, which are respectively a memory 0 to a memory m. Cxl swtich_0 2201 is configured to allocate available memory of the plurality of memories to cpu_0 2101 to store memory data from cpu_0 2101 with the available memory.
The cpu_1 2102 and the Cxl swtich_1 2202 are electrically connected via a PCIe bus, and the Cx1 swtich_1 2202 is electrically connected via a PCIe bus to the second storage unit 2302, the GPUi240I, and the GPUi240I, respectively. The second memory section 2302 includes n+1 memories, which are respectively a memory 0 to a memory n.
According to an embodiment of the present disclosure, cxl swtich_1 2202 is configured to allocate available memory of the plurality of memories to cpu_1 2102 to store memory data from cpu_1 2102 with the available memory.
According to embodiments of the present disclosure, available memory may refer to memory that is not allocated to a graphics memory unit. For example: in the first storage unit 2301, memories 1 to m are allocated to the GPU0 2400 and the GPUl 2401, respectively. The remaining memory 0 may be allocated as available memory to CPU_0 2101 for storing memory data from CPU_0 2101.
According to the embodiment of the disclosure, based on the Cxl protocol, the control unit can be utilized to allocate storage space for the memory data and the display data at the same time, and meanwhile, the memory space and the display space of the device are expanded, so that the data transmission speed and the processing efficiency of the artificial intelligent model in the training or reasoning process can be improved.
In the application scenario of the high-performance computer group, no matter the memory data or the video memory data, the abnormal reading and storing processes of any group of data can influence the normal execution of the training or reasoning of the model. Therefore, in order to ensure the order of the data reading and storing processes, the corresponding relation between the memory and the CPU and the GPU can be configured in advance.
According to an embodiment of the present disclosure, the plurality of memories includes a first memory for the first central processing unit and a second memory for the plurality of graphics processing units.
According to an embodiment of the disclosure, the control unit is further configured to: distributing the first memory to the first central processing unit according to the corresponding relation between the first memory and the first central processing unit so as to store memory data by using the first memory; and distributing the second memory to the plurality of graphic processing units according to the corresponding relation between the second memory and the plurality of graphic processing units so as to store the video memory data by using the second memory.
For example: for memories 0 to m in the first storage unit 2301, memory 0 may be allocated to the cpu_0 2101 for storing memory data from the cpu_0 2101. Then, memories 1 to j are allocated to GPU0 2400 for storing the video memory data from GPU0 2400, and memories j+1 to m are allocated to GPU1 2401 for storing the video memory data from GPU1 2401, where j represents an integer greater than 1 and less than m.
According to the embodiment of the disclosure, the data orderly read and orderly stored in the model training or reasoning process are realized by configuring the corresponding relation between the memory and the CPU as well as between the memory and the GPU, so that the stability of data processing in the model training or reasoning process is ensured.
For artificial intelligence models, especially models capable of processing multi-mode data such as images, texts and the like, the requirements of memory data and video memory data on storage space are not identical. Therefore, the dynamic allocation of the storage space can be realized based on different requirements of the memory data and the video memory data on the storage space while the video memory data and the memory data share the storage space in the storage unit.
According to an embodiment of the present disclosure, the control unit is further configured to determine a first target memory from the plurality of memories according to a resource requirement of the memory data, and allocate the first target memory to the first central processing unit to store the memory data using the first target memory.
According to embodiments of the present disclosure, the resource requirements of memory data characterize the memory data's demand for storage space. The memory may be determined to be a memory for storing memory data when the available memory space within the memory is greater than a value of a demand for memory data for the memory space.
For example: the memory space required for the memory data to be stored from the cpu_0 2101 may be 70 bits, and in the first memory unit 2301, the available memory space of the memory 0 is 50 bits, and the available memory space of the memory 1 is 100 bits. Since the available storage space of the memory 1 is larger than the storage space demand value of the memory data to be stored, the memory 1 can be determined as a memory for storing the memory data.
Similarly, for the video memory data, the control unit is configured to: determining a second target memory from the plurality of memories according to the resource requirements of the video memory data; and allocating the second target memory to the plurality of graphics processing units to store video memory data from the plurality of graphics processing units using the plurality of memories.
According to embodiments of the present disclosure, the resource requirements of the memory data characterize the memory space requirements of the memory data. The memory may be determined to be a memory for storing the video memory data when the available memory space within the memory is greater than a value of the demand for memory space by the video memory data.
For example: the memory space required for the display data to be stored from the GPU0 2400 may be 80 bits, and in the first memory unit 2301, the available memory space of the memory 0 is 50 bits, and the available memory space of the memory m is 100 bits. Since the available storage space of the memory m is larger than the demand value of the memory space for the video memory data to be stored, the memory m can be determined as a memory for storing the video memory data.
According to the embodiments of the present disclosure, the memory for storing data to be stored may be determined together based on the correspondence of the pre-configured memory with the CPU, the GPU, and the resource requirement of the data to be stored.
For example: based on the correspondence of the memory configured in advance with the CPU, the GPU, in the first storage unit 2301, the memory 0 and the memory 1 are allocated to the memory data for storing the cpu_0 2101. Memory 2 through memory m are allocated to memory data for storing GPU0 2400 and GPU0 2401.
For example: the memory space required for the memory data to be stored from cpu_0 2101 may be 70 bits, the available memory space for memory 0 is 50 bits, and the available memory space for memory 1 is 80 bits. The available memory space of the memory m is 100 bits. Although the available memory space of the memory m is larger, since the memory m is for storing the video memory data. Thus, the memory 1 can be determined as a memory for storing memory data.
For example: the memory space required for the memory data to be stored from cpu_0 2101 may be 70 bits, the available memory space for memory 0 is 50 bits, and the available memory space for memory 1 is 30 bits. At this time, the available storage space of the single memory is insufficient to satisfy the demand of the memory data, and therefore, both the memory 0 and the memory 1 can be used as the memories for storing the content data.
According to the embodiment of the disclosure, based on different requirements of memory data and video memory data on storage space, dynamic allocation of the storage space is realized, the storage space of the video memory data is improved, and meanwhile, the storage space of the memory data is improved, so that training and/or reasoning requirements of a model with huge parameter scale can be met.
For models with smaller parameter scales, the memory data has relatively stable requirements for storage space in the model training and/or reasoning process, but the video memory data not only requires larger storage space, but also has requirements for the bandwidth of a PCIe bus. Thus, for models with smaller parameter sizes, the memory space of the GPU may be extended based only on the Cxl protocol.
Fig. 3 schematically illustrates a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure to meet the demands of a smaller parameter scale model for memory space and bus bandwidth.
As shown in fig. 3, cpu_0 3101 and cpu_1 3102, which are electrically connected through a PCIe bus, may be included in this embodiment 300.
CPU_03101 is electrically connected to GPU0 3200 and GPU1 3201, respectively, via PCIe buses. GPU0 3200 and GPU1 3201 are electrically connected to Cxl swtich_03301 via PCIe buses. Cxl swtich_03301 and third memory unit 3401 are electrically connected via PCIe bus. The third memory unit 3401 may include m+1 memories, namely, memories 0 to m.
For example: GPU0 3200 may configure at least two communication ports, one port for communicating with CPU_0 3101 over a PCIe bus. The other port is used to communicate with Cxl swtich_03301 over the PCIe bus.
According to embodiments of the present disclosure, the memory space of memory 0 to memory m in the third memory unit 3401 may be allocated to GPU0 3200, GPU1 3201 as needed.
For example: the memory data from GPU0 3200 requires more than 70 bits of memory and the memory data from GPU1 3201 requires less than 50 bits of memory. The available memory space of each of the memories 0 to 3 is 100 bits. The available memory space of each of the memories 4 to m is 50 bits. Accordingly, memories 0-3 may be allocated to GPU0 3200 for storing video memory data from GPU0 3200. Memory 4-memory m may be allocated to GPU1 3201 for storing video memory data from GPU1 3201.
The cpu_1 3102 is electrically connected to each of the GPUt 320T and the GPUt 320T via PCIe buses. GPUt 320T and GPUT 320T are electrically connected to Cxl swtich_13302 via PCIe buses. Cxl swtich_13302 and fourth memory unit 3402 are electrically connected by PCIe bus. The fourth memory cell 3402 may include n+1 memories, namely, memories 0 to n.
According to embodiments of the present disclosure, the memory space of memories 0 to n in the fourth memory unit 3402 may be allocated to GPUt 320T, …, GPUt 320T as needed. And the dynamic allocation can be realized by sharing the GPUt 320T, the GPUT 320T and the GPUT … so as to improve the utilization rate of the storage space.
For example: the memory space requirement of the video memory data from the GPUt 320t is 30 bits, the available memory space of the memory n in the fourth memory unit 3402 is 50 bits, and the available memory space of the memory n is 30 bits for storing the video memory data from the GPUt 320t. At this time, the available memory space of the memory n is changed to 20 bits. The memory data from the GPUT 320T has a 10bit demand for memory space, and the currently available memory space of the memory n meets the memory demand and can be used for storing the memory data from the GPUT 320T. The sharing of the storage space of the memory n by the GPUT 320T and the GPUt 320T is realized.
According to the embodiment of the disclosure, the control unit is electrically connected with the GPU based on the Cxl protocol, so that the bandwidth of the interconnection bus of the control unit only needs to meet the transmission requirement of video memory data of the GPU, the technical effect of sharing the storage space of the memory by a plurality of GPUs is achieved, dynamic allocation of the shared storage space is facilitated, and the utilization rate of the storage space is improved.
Based on the foregoing data processing apparatus, the embodiment of the present disclosure further provides a data storage method, including: in response to received video memory data to be stored from the graphics processing unit, determining a third target memory for the video memory data to be stored from a plurality of memories in the storage unit according to the identification of the video memory data to be stored; and storing the to-be-stored display data to a third target memory.
Fig. 4 schematically illustrates a schematic diagram of a data storage method of storing display data according to an embodiment of the present disclosure.
As shown in fig. 4, in embodiment 400, memory 1_1 is determined from storage unit 410 as the third target memory for the memory data to be stored, based on the memory data 401 and the identification 402 of the memory data from the graphic processing unit, and memory data 401 is stored into memory 1_1.
In accordance with embodiments of the present disclosure, the identification of the memory data 402 may be used to identify the source of the memory data. For example: the identification of the video memory data from GPUa is IDa.
According to embodiments of the present disclosure, the correspondence between the graphics processing unit and the memory may be configured, for example: the memory corresponding to GPUa may be memory 1.
Thus, the target memory T for storing the display memory data 401 can be determined based on the identification 402"ida" of the display memory data 1 403 is memory 1.
It should be noted that the data storage method shown in fig. 4 is applicable to any of the data processing apparatuses of fig. 1 to 3 described above.
According to the embodiment of the disclosure, the memory for storing the video memory data to be stored is determined from the plurality of memories based on the identification of the video memory data, so that the video memory data can be efficiently stored.
Although ordered storage can be realized based on the correspondence between the memory and the plurality of graphics processing units, the storage space requirements are different due to different lengths of video memory data generated by different graphics processing units, and the storage based on the correspondence may affect the effective utilization rate of the storage space.
Therefore, the data storage method provided by the embodiment of the present disclosure further includes the following operations: and determining a third target memory from the memories in the memory unit according to the resource requirement of the video memory data to be stored, wherein the resource requirement of the video memory data to be stored characterizes the requirement of the video memory data to be stored on the memory space.
Fig. 5 schematically illustrates a schematic diagram of a data storage method of storing display data according to another embodiment of the present disclosure.
As shown in fig. 5, in embodiment 500, according to the resource demand 502 from the display data 501 and the display memory data in the graphic processing unit, the third target memory for the display memory data to be stored for the memory 251_2 is determined from the storage unit 510, and the display data 501 is stored into the memory 2_2.
According to an embodiment of the present disclosure, the resource requirements 502 of the memory data characterize the memory space requirements of the memory data. For example: the length of the video memory data is 50 bits, and then the demand of the video memory data for the storage space is at least 50 bits.
For example: the available memory space of the memory 0_0 in the memory unit 510 is 30 bits, and the available memory space of the memory 2_2 in the memory unit 510 is 60 bits. Since 60 bits are larger than the length of the video memory data, the memory 2 510_2 can be determined as the target memory T 2 503。
For example: the available memory space of the memory 0_0 in the memory unit 510 is 30 bits, and the available memory space of the memory 2_2 in the memory unit 510 is 30 bits. Since the sum of the available storage spaces of the memory 0_0 and the memory 2 510_2 can satisfy the resource requirement of the video memory data, the memory 0_0 and the memory 2 510_2 can be determined as the target memory T 2 503。
According to the embodiment of the disclosure, according to the resource requirement of the video memory data, the sharing of the storage space of the memory by a plurality of GPUs can be realized, and the utilization rate of the storage space is improved.
It should be noted that the data storage method shown in fig. 5 is applicable to any of the data processing apparatuses of fig. 1 to 3 described above.
For training and/or reasoning scenarios of models with large parameters, not only is a large amount of storage space and communication channels needed for the data to be displayed, but also a large amount of storage space and communication channels are needed for the memory data.
Therefore, the data storage method provided by the embodiment of the present disclosure further includes the following operations: in response to the received memory data to be stored from the central processing unit, determining a fourth target memory from the plurality of memories in the storage unit according to the identification of the memory data to be stored; and storing the fourth memory data to be stored into a fourth target memory.
Fig. 6 schematically illustrates a schematic diagram of a data storage method for storing both memory data and video memory data according to an embodiment of the present disclosure.
As shown in fig. 6, the memory is determined from the storage unit 610 based on the identification 602 of the video memory dataThe memory 1 610_1 is a target memory T for storing the display data 601 3 603. The memory 2_2 may be determined from the storage unit 610 as the target memory T for storing the memory data 604 based on the identification 605 of the memory data 4 606。
For example: the identification 602 of the memory data may be used to identify the source of the memory data. For example: the identification of the video memory data from the GPUb is IDb.
According to embodiments of the present disclosure, the correspondence between the graphics processing unit and the memory may be configured, for example: the memory corresponding to GPUb may be memory 1.
Therefore, the memory 1_1 can be determined as the target memory T for storing the display data 601 3 603。
For example: the identification 605 of the memory data may be used to identify the source of the memory data. For example: the memory data from CPU0 is identified as ID o
According to embodiments of the present disclosure, the correspondence between the central processing unit and the memory may be configured, for example: the memory corresponding to CPU0 may be memory 2.
Thus, the memory 2 610_2 can be determined as the target memory T for storing the memory data 604 4 606。
According to embodiments of the present disclosure, the CPU and the GPU typically do not share a memory space, and thus, when the correspondence between the memory and the CPU, the GPU is preconfigured, the memory allocated to the CPU is typically not reallocated to the GPU.
It should be noted that the data storage method shown in fig. 6 is applicable to the data processing apparatus shown in fig. 1 to 2 described above.
For the memory data, in order to improve the utilization rate of the storage space, the memory may be allocated according to the resource requirement of the memory data. Accordingly, the data storage method provided by the embodiment of the present disclosure further includes: and determining a fourth target memory from available memories in the storage unit according to the resource requirement of the memory data to be stored, wherein the resource requirement of the memory data to be stored characterizes the requirement of the memory data to be stored on the storage space.
Fig. 7 schematically illustrates a schematic diagram of a data storage method for storing both memory data and video memory data according to another embodiment of the present disclosure.
As shown in fig. 7, according to the resource requirement 702 of the video memory data, the memory 0_0 is determined from the memory unit 710 as the target memory T for storing the video memory data 701 5 703. Determining the memory 3_3 from the storage unit 710 as the target memory T for storing the memory data 704 according to the resource requirement 705 of the memory data 6 706。
For example: the length of the display data may be 175 bits and the length of the memory data may be 70 bits. In the memory unit 710, the available memory space of the memory 0_0 is 200 bits, the available memory space of the memory 1_710_1 is 30 bits, the available memory space of the memory 2_710_2 is 50 bits, and the available memory space of the memory 3_3 is 100 bits.
Since the available storage space of the memory 0 710_0 is larger than the length of the video memory data, the memory 0 can be determined as the target memory T for storing the video memory data 6 706。
Since the available storage space of the memory 3 710_3 is 100 bits larger than the length of the memory data, the memory 3 can be determined as the target memory T for storing the memory data 5 703。
It should be noted that the data storage method shown in fig. 7 is applicable to the data processing apparatus shown in fig. 1 to 2 described above.
According to the embodiment of the disclosure, the memory space of the memory and the memory data can be simultaneously improved in the face of the scene demands of different model training and/or reasoning, and the problems of low training efficiency, low reasoning precision and the like caused by limited memory space in the artificial intelligent model training and/or reasoning scene are solved.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A data processing apparatus comprising:
a plurality of graphics processing units;
the storage unit is electrically connected with the control unit and comprises a plurality of memories; and
the control unit is electrically connected with the plurality of graphic processing units and the storage unit and is configured to sequentially allocate the plurality of memories to the plurality of graphic processing units according to the data state of the data to be stored so as to store the video memory data from the plurality of graphic processing units by utilizing the plurality of storage media.
2. The apparatus of claim 1, wherein the apparatus further comprises:
the first central processing unit is electrically connected with the control unit;
wherein the control unit is further configured to allocate available memory of the plurality of memories to the first central processing unit to store memory data from the first central processing unit with the available memory.
3. The device of claim 2, wherein the plurality of memories comprises a first memory for the first central processing unit and a second memory for the plurality of graphics processing units; the control unit is further configured to:
distributing the first memory to the first central processing unit according to the corresponding relation between the first memory and the first central processing unit so as to store the memory data by using the first memory; and
and distributing the second memory to the plurality of graphic processing units according to the corresponding relation between the second memory and the plurality of graphic processing units so as to store the video memory data by using the second memory.
4. The device of claim 2, wherein the control unit is further configured to:
determining a first target memory from the plurality of memories according to the resource requirements of the memory data, wherein the resource requirements of the memory data represent the requirements of the memory data on storage space; and
and distributing the first target memory to the first central processing unit so as to store the memory data by using the first target memory.
5. The apparatus of claim 1, further comprising:
and the second central processing unit is electrically connected with the plurality of graphic processing units so as to process data from the second central processing unit by utilizing the plurality of graphic processing units.
6. The device of any of claims 1-5, wherein the control unit is configured to:
determining a second target memory from the memories according to the resource requirements of the video memory data, wherein the resource requirements of the video memory data represent the requirements of the video memory data on the storage space; and
and distributing the second target memory to the plurality of graphic processing units so as to store the video memory data from the plurality of graphic processing units by utilizing the plurality of memories.
7. A data storage method, comprising:
in response to received video memory data to be stored from a graphics processing unit, determining a third target memory for the video memory data to be stored from a plurality of memories in a storage unit according to the identification of the video memory data to be stored; and
and storing the to-be-stored display data to the third target memory.
8. The method of claim 7, further comprising:
determining the third target memory from a plurality of memories in the memory unit according to the resource requirement of the video memory data to be stored,
the resource requirement of the video memory data to be stored characterizes the requirement of the video memory data to be stored on a storage space.
9. The method of claim 7, further comprising:
responding to received memory data to be stored from a central processing unit, and determining a fourth target memory from a plurality of memories in the storage unit according to the identification of the memory data to be stored; and
and storing the fourth memory data to be stored into the fourth target storage.
10. The method of claim 9, further comprising:
determining the fourth target memory from available memory of a plurality of memories in the memory unit according to the resource requirement of the memory data to be stored,
the resource requirement of the memory data to be stored characterizes the requirement of the memory data to be stored on a storage space.
CN202311759179.4A 2023-12-20 2023-12-20 Data processing apparatus and data storage method for large language model Pending CN117742954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311759179.4A CN117742954A (en) 2023-12-20 2023-12-20 Data processing apparatus and data storage method for large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311759179.4A CN117742954A (en) 2023-12-20 2023-12-20 Data processing apparatus and data storage method for large language model

Publications (1)

Publication Number Publication Date
CN117742954A true CN117742954A (en) 2024-03-22

Family

ID=90280834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311759179.4A Pending CN117742954A (en) 2023-12-20 2023-12-20 Data processing apparatus and data storage method for large language model

Country Status (1)

Country Link
CN (1) CN117742954A (en)

Similar Documents

Publication Publication Date Title
US10545762B2 (en) Independent mapping of threads
CN110389905B (en) Resource release method, resource allocation method, device and computer program product
US20210373799A1 (en) Method for storing data and method for reading data
CN110750351B (en) Multi-core task scheduler, multi-core task scheduling method, multi-core task scheduling device and related products
CN108804383B (en) Support point parallel enumeration method and device based on measurement space
US9720829B2 (en) Online learning based algorithms to increase retention and reuse of GPU-generated dynamic surfaces in outer-level caches
EP3846036A1 (en) Matrix storage method, matrix access method, apparatus and electronic device
US11023825B2 (en) Platform as a service cloud server and machine learning data processing method thereof
CN114840339A (en) GPU server, data calculation method and electronic equipment
CN113535087A (en) Data processing method, server and storage system in data migration process
CN116467235B (en) DMA-based data processing method and device, electronic equipment and medium
CN117130571A (en) Display method, device, chip and storage medium based on multi-core heterogeneous system
CN117742954A (en) Data processing apparatus and data storage method for large language model
EP3779706B1 (en) Method, apparatus, device and computer-readable storage medium for storage management
CN116243872B (en) Private memory allocation addressing method and device, graphics processor and medium
CN115599307B (en) Data access method, device, electronic equipment and storage medium
CN111767999A (en) Data processing method and device and related products
CN115858432B (en) Access method, device, electronic equipment and readable storage medium
CN117149447B (en) Bandwidth adjustment method, device, equipment and storage medium
CN116166605B (en) Data hybrid transmission method, device, DMA controller, medium and system
CN116402141B (en) Model reasoning method and device, electronic equipment and storage medium
CN117931689A (en) Data processing method, device, chip, equipment and storage medium
CN108804218B (en) Hard disk allocation method and system
CN117555852A (en) Product adaptation method and device of system-level chip, electronic equipment and storage medium
CN114780445A (en) Address remapping system and method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination