CN112231269A - Data processing method of multiprocessor system and multiprocessor system - Google Patents

Data processing method of multiprocessor system and multiprocessor system Download PDF

Info

Publication number
CN112231269A
CN112231269A CN202011056600.1A CN202011056600A CN112231269A CN 112231269 A CN112231269 A CN 112231269A CN 202011056600 A CN202011056600 A CN 202011056600A CN 112231269 A CN112231269 A CN 112231269A
Authority
CN
China
Prior art keywords
memory
processor
mapper
bit width
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011056600.1A
Other languages
Chinese (zh)
Inventor
赖振楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hosin Global Electronics Co Ltd
Original Assignee
Hosin Global Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hosin Global Electronics Co Ltd filed Critical Hosin Global Electronics Co Ltd
Priority to CN202011056600.1A priority Critical patent/CN112231269A/en
Publication of CN112231269A publication Critical patent/CN112231269A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a data processing method of a multiprocessor system and the multiprocessor system, wherein the multiprocessor system comprises a central processing unit, a memory, at least one processor with different bit widths from the memory and at least one memory mapper, each memory mapper is respectively connected with a corresponding processor, and the method comprises the following steps: each memory mapper converts the data of the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper; and the processor reads and processes the preset bit width data generated by the corresponding memory mapper. The invention converts the data of the memory into the preset bit width data corresponding to the bit width of the processor through the memory mapper, thereby ensuring that the processor does not need to have the same bit width as the memory, greatly improving the flexibility of the multiprocessor system, avoiding the data transfer among the processors and saving the cost of the multiprocessor system.

Description

Data processing method of multiprocessor system and multiprocessor system
Technical Field
The present invention relates to the field of computer systems, and more particularly, to a data processing method for a multiprocessor system and the multiprocessor system.
Background
Multiprocessor Systems (Multiprocessor Systems) include two or more processors with similar functions, the processors can exchange data with each other, all the processors share a memory, I/O devices, a controller and external devices, the whole hardware system is controlled by a unified operating system, and operations, tasks, programs, arrays and all levels of elements are comprehensively parallel between the processors and the programs, so as to improve the data processing speed. At present, multiple processors are widely applied to products such as artificial intelligence, multimedia, voice communication and the like.
However, since the plurality of processors need to share the memory, the plurality of processors need to have the same bit width as the memory, which greatly limits the use of the multiprocessor system, increases the cost of the multiprocessor system, and makes the application inflexible.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a data processing method and system for a multiprocessor system, aiming at the problem that the multiprocessor system needs to adopt a processor with the same bit width as a memory, which results in higher cost and inflexible application of the multiprocessor system.
In an embodiment of the present invention, a data processing method for a multiprocessor system is provided, where the multiprocessor system includes a central processing unit, a memory, at least one processor with a bit width different from that of the memory, and at least one memory mapper, and each memory mapper is connected to a corresponding processor, and the method includes:
each memory mapper converts the data of the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper;
and the processor reads and processes the preset bit width data generated by the corresponding memory mapper.
Preferably, the memory is PCM, NRAM, MRAM, ReRAM or FeRAM, each memory mapper includes a register buffer, and the register buffer and a processor to which the memory mapper is connected have the same bit width; each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps:
each memory mapper sequentially writes the narrow bit width data in the memory into different bits of the register buffer according to a clock signal;
and the register buffer outputs data of all bits at the same time to form the preset bit width data.
Preferably, the memory is a persistent storage-class memory, the persistent storage-class memory includes a memory interface, a control chip, a DRAM chipset, and a flash memory integrated on the same base, and the memory interface, the DRAM chipset, and the flash memory are respectively connected to the control chip, and the memory interface is connected to the central processing unit via a memory bus; the method further comprises the following steps:
when receiving a first read-write request of the central processing unit, the control chip acquires an instruction corresponding to the first read-write request from the DRAM chipset and returns the instruction corresponding to the first read-write request to the central processing unit through a memory interface;
and when the instructions in the DRAM chipset meet preset conditions, the control chip acquires a subsequent instruction set of the instructions in the DRAM chipset from the flash memory and moves the subsequent instruction set to the DRAM chipset.
Preferably, the processor comprises a graphics processor, the memory mapper comprises a first memory mapper, the persistent storage class memory comprises GDDRs respectively connected to the first memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the GDDR, the first memory mapper are integrated into the same substrate, the first memory mapper is connected to the graphics processor via a GDDR bus;
the reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor includes: when receiving a second read-write request of the graphics processor, the first memory mapper acquires a graphics processing instruction corresponding to the second read-write request from the GDDR and returns the graphics processing instruction corresponding to the second read-write request to the graphics processor;
each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps: and when the graphic processing instruction in the GDDR meets a preset condition, the first memory mapper acquires a subsequent graphic processing instruction set of the graphic processing instruction in the GDDR from the flash memory, converts the subsequent graphic processing instruction set into preset bit width data corresponding to the bit width of the graphic processor and then moves the preset bit width data to the GDDR.
Preferably, the processor includes an AI processor, the memory mapper includes a second memory mapper, the persistent storage class memory includes HBMs respectively connected to the second memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the HBM, and the second memory mapper are integrated into the same substrate, and the second memory mapper is connected to the AI processor via an HBM bus;
the reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor includes: when receiving a third read/write request of the AI processor, the second memory mapper acquires an AI instruction corresponding to the third read/write request from the HBM and returns the AI instruction corresponding to the third read/write request to the AI processor;
each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps: and when the AI instruction in the HBM meets a preset condition, the second memory mapper acquires a subsequent AI instruction set of the AI instruction in the HBM from the flash memory, converts the subsequent AI instruction set into preset bit width data corresponding to the bit width of the AI processor and then moves the preset bit width data to the HBM.
The embodiment of the invention also provides a multiprocessor system, which comprises a central processing unit, a memory, at least one processor with different bit widths from the memory and at least one memory mapper, wherein each memory mapper is respectively connected with a corresponding processor; wherein:
the memory mapper converts data from the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper;
and the processor is used for reading and processing the preset bit width data generated by the corresponding memory mapper.
Preferably, the memory is PCM, NRAM, MRAM, ReRAM or FeRAM, each memory mapper includes a register buffer, and the register buffer and a processor to which the memory mapper is connected have the same bit width;
and the memory mapper sequentially writes the narrow bit width data in the memory into different bits of the register buffer according to a clock signal, and simultaneously outputs data of all bits through the register buffer to form the preset bit width data.
Preferably, the memory is a persistent storage-class memory, the persistent storage-class memory includes a memory interface, a control chip, a DRAM chipset, and a flash memory integrated on the same base, and the memory interface, the DRAM chipset, and the flash memory are respectively connected to the control chip, and the memory interface is connected to the central processing unit via a memory bus;
when receiving a first read-write request of the central processing unit, the control chip acquires an instruction corresponding to the first read-write request from the DRAM chipset and returns the instruction corresponding to the first read-write request to the central processing unit through a memory interface;
and when the instructions in the DRAM chipset meet preset conditions, the control chip acquires a subsequent instruction set of the instructions in the DRAM chipset from the flash memory and moves the subsequent instruction set to the DRAM chipset.
Preferably, the processor comprises a graphics processor, the memory mapper comprises a first memory mapper, the persistent storage class memory comprises GDDRs respectively connected to the first memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the GDDR, the first memory mapper are integrated into the same substrate, the first memory mapper is connected to the graphics processor via a GDDR bus;
when receiving a second read-write request of the graphics processor, the first memory mapper acquires a graphics processing instruction corresponding to the second read-write request from the GDDR and returns the graphics processing instruction corresponding to the second read-write request to the graphics processor;
and when the graphic processing instruction in the GDDR meets a preset condition, the first memory mapper acquires a subsequent graphic processing instruction set of the graphic processing instruction in the GDDR from the flash memory, converts the subsequent graphic processing instruction set into preset bit width data corresponding to the bit width of the graphic processor and then moves the preset bit width data to the GDDR.
Preferably, the processor includes an AI processor, the memory mapper includes a second memory mapper, the persistent storage class memory includes HBMs respectively connected to the second memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the HBM, and the second memory mapper are integrated into the same substrate, and the second memory mapper is connected to the AI processor via an HBM bus;
when receiving a third read/write request of the AI processor, the second memory mapper acquires an AI instruction corresponding to the third read/write request from the HBM and returns the AI instruction corresponding to the third read/write request to the AI processor;
and when the AI instruction in the HBM meets a preset condition, the second memory mapper acquires a subsequent AI instruction set of the AI instruction in the HBM from the flash memory, converts the subsequent AI instruction set into preset bit width data corresponding to the bit width of the AI processor and then moves the preset bit width data to the HBM.
According to the data processing method and the multiprocessor system of the embodiment of the invention, the data of the memory is converted into the preset bit width data corresponding to the bit width of the processor through the memory mapper, so that the processor does not need to have the same bit width as the memory, the flexibility of the multiprocessor system is greatly improved, the data transfer among the processors is avoided, and the cost of the multiprocessor system is saved.
Drawings
FIG. 1 is a schematic diagram of a multiprocessor system provided in a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a register buffer in the multiprocessor system of FIG. 1 for data conversion;
FIG. 3 is a diagram of a multiprocessor system provided in a second embodiment of the present invention;
FIG. 4 is a schematic diagram of a memory in the multiprocessor system of FIG. 3;
fig. 5 is a flowchart illustrating a data processing method of a multiprocessor system according to a first embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a schematic diagram of a multiprocessor system according to a first embodiment of the present invention, where the multiprocessor system may be a control system in a product such as artificial intelligence, multimedia, and voice communication, or may be a server in a cloud storage system. The multiprocessor system of the embodiment includes a Central Processing Unit (CPU) 11, a memory 12, a memory mapper 16, and a processor 17, and the CPU 11 and the memory 12 are respectively connected to a memory bus. Furthermore, similar to the existing computer system, the multiprocessor system may further include a DMA controller (Direct Memory Access) 13, a PCIe bridge 14, and a Hard Disk 15 connected to the PCIe bridge 14 through a PCIe bus, where the Hard Disk 15 may be a Hard Disk Drive (Hard Disk Drive, HDD) or a Solid State Drive (SSD), and the like. The multiprocessor system may operate based on an embedded operating system, and the DMA controller 13 may write data in the hard disk 15 to the memory 12 or write data in the memory 12 to the hard disk 15 according to an instruction of the central processing unit 11.
The cpu 11 and the memory have the same bit width, so that the cpu 11 can directly access the data in the memory 12 through the memory bus. In this embodiment, the processor 17 is connected to the memory bus via the memory mapper 16. The processor 17 may be a graphic processor, an AI processor, etc. which has different bandwidth from the memory 12 (i.e. the processor 17 has different bit width from the central processing unit 11), so that the processor 17 cannot directly access the data in the memory 12.
In this embodiment, the memory mapper 16 may convert the data from the memory 12 into the preset bit width data, where the preset bit width data corresponds to a bit width of the processor 17 connected to the memory mapper 16, and the processor 17 reads and processes the preset bit width data generated by the memory mapper 16. Meanwhile, the memory mapper 16 may also convert the data processed by the processor 17 into data corresponding to the bit width of the memory 12, so that the operation result of the processor 17 may be written back to the memory 12.
In practical applications, the multiprocessor system may include a plurality of memory mappers 16 and a plurality of processors 17 having different bit widths from the memory, and each processor 17 is connected to the memory bus via one memory mapper 16, and the memory mapper 16 performs bit width conversion on the data, so that each processor 17 can process the data in the memory 12 separately.
According to the multiprocessor system, the data of the memory is converted into the preset bit width data corresponding to the bit width of the processor through the memory mapper, so that the processor does not need to have the same bit width as the memory, the flexibility of the multiprocessor system is greatly improved, and the cost of the multiprocessor system is saved.
In this embodiment, the memory 12 may be PCM, NRAM, MRAM, ReRAM, FeRAM, or the like. As shown in fig. 2, each memory mapper 16 includes a register buffer 161, and the register buffer 161 and the processor 17 connected to the memory mapper 16 have the same bit width.
The memory mapper 16 may sequentially write the narrow-bit-width data in the memory 12 to different bits of the register buffer 161 according to the system clock signal CLK, and output data of all bits through the register buffer 161 to form preset-bit-width data, so that the processor 17 may process the data in the memory 12.
Of course, in practical applications, the memory mapper 16 may include a buffer having a bit width identical to that of the processor 17, the register buffer 161 may write the data stored therein, and the processor 17 may read the data in the buffer, thereby improving the data processing efficiency.
Fig. 3 to 4 are schematic diagrams of a multiprocessor system according to a second embodiment of the present invention, and similarly, the multiprocessor system may be a control system in a product such as artificial intelligence, multimedia, and voice communication, or may be a server in a cloud storage system.
Similarly, the multiprocessor system of the embodiment includes a central processing unit 31, a memory 32, a DMA controller 33, a PCIe bridge 34, a hard disk 35 connected to the PCIe bridge 34 through a PCIe bus, a memory mapper, and a processor 37, and the central processing unit 31, the memory 32, the DMA controller 33, and the PCIe bridge 34 are respectively connected to the memory bus.
In this embodiment, the memory 32 is a persistent storage class memory, which includes a memory interface, a control chip 321, a DRAM chipset 322, and a flash memory 328 integrated on the same substrate, and the memory interface, the DRAM chipset 322, and the flash memory 328 are respectively connected to the control chip 321, and the memory interface is connected to the cpu 31 via a memory bus. The memory capacity of the flash memory 328 is much greater than the memory capacity of the DRAM chipset 322.
In the embodiment, the control chip 321 may, in response to a read/write request of the central processing unit 31 connected to the memory interface, obtain an instruction set from the DRAM chipset 322, transmit the instruction set to the central processing unit 31 through the memory interface, and write execution result data of the central processing unit 31 into the DRAM chipset 322, so as to implement data interaction between the central processing unit 31 and the DRAM chipset 322, and specifically, the central processing unit 31 may obtain the instruction set from the DRAM chipset 322 according to a program pointer and execute the instruction set.
The control chip 321 can also realize the interaction between the DRAM chipset 322 and the data in the flash memory 328. Specifically, when the instruction set (including the instruction code and the data) read by the central processing unit 31 in the DRAM chipset 322 meets a preset condition (for example, the instruction set read by the central processing unit 31 in the DRAM chipset 322 is smaller than a preset number), the control chip 321 obtains a subsequent instruction set (including the instruction code and the data) of the instruction set in the DRAM chipset 322 from the flash memory 328, and stores the subsequent instruction set in the DRAM chipset 322 for subsequent access by the central processing unit 31.
By the above manner, the instruction set in the DRAM chipset 322 can be automatically updated according to the operating state of each central processing unit 31, so that the storage capacity of the DRAM chipset 322 is close to that of the flash memory 328, the central processing unit 31 can be always in a high-efficiency operating state, and the method is suitable for the fields of cloud computing and the like with high requirements on computing resources, and can greatly improve the operating efficiency of the system.
In another embodiment of the present invention, the processor 37 includes a graphics processor 371, and the memory mapper includes a first memory mapper 323, and the first memory mapper 323 may be formed by a control chip. Accordingly, the persistent storage class memory includes GDDR (Graphics Double Data Rate) 324 respectively connected to the first memory mapper 323, and the memory interface, the DRAM chipset 322, the control chip 321, the flash memory 328, the GDDR 324, and the first memory mapper 323 are integrated into the same body (i.e., the memory mapper is integrated into the memory 32), and the first memory mapper 323 is connected to the Graphics processor 371 through the GDDR bus.
In this embodiment, when receiving the second read/write request from the gpu 371, the first memory mapper 323 acquires the gpu instruction corresponding to the second read/write request from the GDDR 324 and returns the gpu instruction corresponding to the second read/write request to the gpu 371 through the GDDR interface. The second read/write request may specifically be an image or data display instruction, and the graphic processor 371 may output the image or data to a display device such as a display by executing the graphic processing instruction returned by the first memory mapper 323.
The first memory mapper 323 further obtains a subsequent gpu instruction set of the gpu instructions in the GDDR 324 from the flash memory 328 when the gpu instructions in the GDDR 324 meet a predetermined condition (for example, the gpu instructions in the GDDR 324 waiting for the gpu 371 to read are smaller than a predetermined number), converts the subsequent gpu instruction set into a predetermined bit width data corresponding to the bit width of the gpu 371, and then moves the predetermined bit width data to the GDDR 324 for the gpu 371 to access subsequently.
In another embodiment of the present invention, the processor 37 includes an AI processor 372, and the Memory mapper includes a second Memory mapper 325, the second Memory mapper 325 may be formed by a controller chip, accordingly, the persistent storage class Memory includes a High Bandwidth Memory (HBM) 326 respectively connected to the second Memory mapper 325, and the Memory interface, the DRAM chipset 322, the controller chip 321, the flash Memory 328, the HBM326 and the second Memory mapper 325 are integrated into the same substrate (i.e., the Memory mapper is integrated into the Memory 32), and the second Memory mapper 325 is connected to the AI processor 372 via the HBM bus.
In this embodiment, when receiving the third read/write request from the AI processor 372, the second memory mapper 325 obtains the AI instruction corresponding to the third read/write request from the HBM326 and returns the AI instruction corresponding to the third read/write request to the AI processor 372.
When the AI instruction in the HBM326 meets a preset condition (for example, the AI instruction waiting for the AI processor 372 to read in the HBM326 is smaller than a preset number), the second memory mapper 325 obtains a subsequent AI instruction set of the AI instruction in the HBM326 from the flash memory 328, converts the subsequent AI instruction set into preset bit-width data corresponding to the bit-width of the AI processor 372, and then moves the preset bit-width data to the HBM326 for the AI processor 372 to access subsequently.
As shown in fig. 5, the present invention further provides a data processing method for a multiprocessor system, where the method may be applied to a control system for products such as artificial intelligence, multimedia, and voice communication, and may also be applied to a server in a cloud storage system. Referring to fig. 1, the multiprocessor system includes a central processing unit, a memory, at least one processor having a bit width different from that of the memory, and at least one memory mapper, where each memory mapper is connected to a corresponding processor. The central processing unit and the memory have the same bit width, so that the central processing unit can directly access data in the memory through the memory bus. The processor may be a graphic processor, an AI processor, etc., which has different bandwidths with the memory (i.e., the processor and the central processing unit have different bit widths), so that the processor cannot directly access the data in the memory.
The method of the embodiment comprises the following steps:
step S51: each memory mapper converts data stored in the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper.
Specifically, when the memory adopts PCM, NRAM, MRAM, ReRAM, FeRAM, or the like, each memory mapper includes a register buffer, and the register buffer and the processor to which the memory mapper is connected have the same bit width. The memory mapper can sequentially write the narrow-bit wide data in the memory into different bits of the register buffer according to the system clock signal CLK, and simultaneously output the data of all the bits through the register buffer to form preset bit wide data.
Of course, in practical applications, the memory mapper may include a buffer, the bit width of the buffer is the same as the bit width of the processor, the register buffer may write the data stored in the register buffer into the buffer, and the processor may read the data in the buffer, thereby improving the data processing efficiency.
Step S52: and the processor reads and processes the preset bit width data generated by the corresponding memory mapper. The bit width of the data generated by the memory mapper is the same as the bit width of the processor, so the processor can directly process the data.
The method can also be applied to a multiprocessor system with a persistent memory, namely the memory is a persistent storage memory which comprises a memory interface, a control chip, a DRAM chipset and a flash memory which are integrated on the same substrate, wherein the memory interface, the DRAM chipset and the flash memory are respectively connected with the control chip, and the memory interface is connected with the central processing unit through a memory bus. The storage capacity of the flash memory is far larger than that of the DRAM chip set.
At this time, the multiprocessor system data processing method includes, in addition to the above-described steps S51 to S52:
when receiving a first read-write request of the central processing unit, the control chip acquires an instruction corresponding to the first read-write request from the DRAM chip set and returns the instruction corresponding to the first read-write request to the central processing unit through the memory interface;
when the instructions in the DRAM chip set meet preset conditions, the control chip acquires a subsequent instruction set of the instructions in the DRAM chip set from the flash memory, converts the subsequent instruction set into preset bit width data corresponding to the bit width of the central processing unit and then moves the preset bit width data to the DRAM chip set.
The multiprocessor system can use an embedded operating system, namely the central processing unit realizes integral operation control based on the embedded operating system, and the instruction set in the DRAM chipset is automatically updated according to the operation state of each central processing unit through the control chip, so that the storage capacity of the DRAM chipset is close to that of a flash memory, the central processing unit can be always in a high-efficiency operation state, the multiprocessor system is suitable for the fields with higher requirements on operation resources, such as cloud computing, and the like, and the operation efficiency of the system can be greatly improved.
In another embodiment of the data processing method of the multiprocessor system of the present invention, the processor includes a graphics processor, and accordingly, the memory mapper includes a first memory mapper, the persistent storage class memory includes GDDRs respectively connected to the first memory mapper, and the memory interface, the DRAM chipset, the controller chip, the flash memory, the GDDR, and the first memory mapper are integrated into the same substrate, and the first memory mapper is connected to the graphics processor via a GDDR bus.
The converting, by the memory mapper in the step S51, the data of the memory into the data with the preset bit width specifically includes: when the graphic processing instruction in the GDDR meets a preset condition, the first memory mapper acquires a subsequent graphic processing instruction set of the graphic processing instruction in the GDDR from the flash memory, converts the subsequent graphic processing instruction set into preset bit width data corresponding to the bit width of the graphic processor and then moves the preset bit width data to the GDDR.
The reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor in the step S52 specifically includes: when receiving a second read/write request of the graphics processor, the first memory mapper acquires a graphics processing instruction corresponding to the second read/write request from the GDDR and returns the graphics processing instruction corresponding to the second read/write request to the graphics processor.
In another embodiment of the data processing method of the multiprocessor system according to the present invention, the processor includes an AI processor, the memory mapper includes a second memory mapper, the persistent storage class memory includes HBMs respectively connected to the second memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the HBM, and the second memory mapper are integrated into a same substrate, and the second memory mapper is connected to the AI processor via an HBM bus.
At this time, the converting the data of the memory into the data with the preset bit width by the memory mapper in the step S51 specifically includes: when receiving a third read-write request of the AI processor, a second memory mapper acquires an AI instruction corresponding to the third read-write request from the HBM and returns the AI instruction corresponding to the third read-write request to the AI processor;
the reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor in the step S52 specifically includes: and when the AI instruction in the HBM meets a preset condition, the second memory mapper acquires a subsequent AI instruction set of the AI instruction in the HBM from the flash memory, converts the subsequent AI instruction set into preset bit width data corresponding to the bit width of the AI processor and then moves the preset bit width data to the HBM.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A data processing method for a multiprocessor system, wherein the multiprocessor system includes a central processing unit, a memory, at least one processor having a bit width different from that of the memory, and at least one memory mapper, and each memory mapper is connected to a corresponding processor, the method comprising:
each memory mapper converts the data of the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper;
and the processor reads and processes the preset bit width data generated by the corresponding memory mapper.
2. The multiprocessor system data processing method of claim 1, wherein the memory is PCM, NRAM, MRAM, ReRAM or FeRAM, each of the memory mappers includes a register buffer, and the register buffer and a processor to which the memory mapper is connected have a same bit width; each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps:
each memory mapper sequentially writes the narrow bit width data in the memory into different bits of the register buffer according to a clock signal;
and the register buffer outputs data of all bits at the same time to form the preset bit width data.
3. The multiprocessor system data processing method of claim 1, wherein the memory is a persistent storage class memory, the persistent storage class memory includes a memory interface, a control chip, a DRAM chipset, and a flash memory integrated into the same base, and the memory interface, the DRAM chipset, and the flash memory are respectively connected to the control chip, and the memory interface is connected to the central processing unit via a memory bus; the method further comprises the following steps:
when receiving a first read-write request of the central processing unit, the control chip acquires an instruction corresponding to the first read-write request from the DRAM chipset and returns the instruction corresponding to the first read-write request to the central processing unit through a memory interface;
and when the instructions in the DRAM chipset meet preset conditions, the control chip acquires a subsequent instruction set of the instructions in the DRAM chipset from the flash memory and moves the subsequent instruction set to the DRAM chipset.
4. The multiprocessor system data processing method of claim 3, wherein the processor comprises a graphics processor, the memory mapper comprises a first memory mapper, the persistent storage class memory comprises GDDRs respectively connected to the first memory mapper, and the memory interface, DRAM chipset, control chip, flash memory, GDDR, first memory mapper are integrated into a same substrate, the first memory mapper is connected to the graphics processor via a GDDR bus;
the reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor includes: when receiving a second read-write request of the graphics processor, the first memory mapper acquires a graphics processing instruction corresponding to the second read-write request from the GDDR and returns the graphics processing instruction corresponding to the second read-write request to the graphics processor;
each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps: and when the graphic processing instruction in the GDDR meets a preset condition, the first memory mapper acquires a subsequent graphic processing instruction set of the graphic processing instruction in the GDDR from the flash memory, converts the subsequent graphic processing instruction set into preset bit width data corresponding to the bit width of the graphic processor and then moves the preset bit width data to the GDDR.
5. The multiprocessor system data processing method of claim 3, wherein the processor comprises an AI processor, the memory mapper comprises a second memory mapper, the persistent storage class memory comprises HBMs respectively connected to the second memory mapper, and the memory interface, the DRAM chipset, the control chip, the flash memory, the HBM, the second memory mapper are integrated into a same substrate, the second memory mapper is connected to the AI processor via an HBM bus;
the reading and processing of the preset bit width data generated by the corresponding memory mapper by the processor includes: when receiving a third read/write request of the AI processor, the second memory mapper acquires an AI instruction corresponding to the third read/write request from the HBM and returns the AI instruction corresponding to the third read/write request to the AI processor;
each memory mapper converts the data of the memory into data with a preset bit width, and the method comprises the following steps: and when the AI instruction in the HBM meets a preset condition, the second memory mapper acquires a subsequent AI instruction set of the AI instruction in the HBM from the flash memory, converts the subsequent AI instruction set into preset bit width data corresponding to the bit width of the AI processor and then moves the preset bit width data to the HBM.
6. A multiprocessor system is characterized in that the multiprocessor system comprises a central processing unit, a memory, at least one processor with different bit widths from the memory and at least one memory mapper, wherein each memory mapper is respectively connected with a corresponding processor; wherein:
the memory mapper converts data from the memory into preset bit width data, and the preset bit width data corresponds to the bit width of a processor connected with the memory mapper;
and the processor is used for reading and processing the preset bit width data generated by the corresponding memory mapper.
7. The multiprocessor system of claim 6, wherein the memory is PCM, NRAM, MRAM, ReRAM, or FeRAM, each of the memory mappers includes a register buffer, and the register buffer has a same bit width as a processor to which the memory mapper is connected;
and the memory mapper sequentially writes the narrow bit width data in the memory into different bits of the register buffer according to a clock signal, and simultaneously outputs data of all bits through the register buffer to form the preset bit width data.
8. The multiprocessor system according to claim 6, wherein the memory is a persistent storage class memory, the persistent storage class memory includes a memory interface, a control chip, a DRAM chipset, and a flash memory integrated into the same substrate, and the memory interface, the DRAM chipset, and the flash memory are respectively connected to the control chip, and the memory interface is connected to the central processing unit via a memory bus;
when receiving a first read-write request of the central processing unit, the control chip acquires an instruction corresponding to the first read-write request from the DRAM chipset and returns the instruction corresponding to the first read-write request to the central processing unit through a memory interface;
and when the instructions in the DRAM chipset meet preset conditions, the control chip acquires a subsequent instruction set of the instructions in the DRAM chipset from the flash memory and moves the subsequent instruction set to the DRAM chipset.
9. The multiprocessor system of claim 8, wherein the processor comprises a graphics processor, the memory mapper comprises a first memory mapper, the persistent storage class memory comprises GDDRs respectively connected to the first memory mapper, and the memory interface, DRAM chipset, controller chip, flash memory, GDDR, first memory mapper are integrated into the same base, the first memory mapper is connected to the graphics processor via a GDDR bus;
when receiving a second read-write request of the graphics processor, the first memory mapper acquires a graphics processing instruction corresponding to the second read-write request from the GDDR and returns the graphics processing instruction corresponding to the second read-write request to the graphics processor;
and when the graphic processing instruction in the GDDR meets a preset condition, the first memory mapper acquires a subsequent graphic processing instruction set of the graphic processing instruction in the GDDR from the flash memory, converts the subsequent graphic processing instruction set into preset bit width data corresponding to the bit width of the graphic processor and then moves the preset bit width data to the GDDR.
10. The multiprocessor system of claim 8, wherein the processor comprises an AI processor, the memory mapper comprises a second memory mapper, the persistent storage class memory comprises HBMs respectively coupled to the second memory mapper, and the memory interface, DRAM chipset, controller chip, flash memory, HBM, second memory mapper are integrated into the same substrate, the second memory mapper is coupled to the AI processor via an HBM bus;
when receiving a third read/write request of the AI processor, the second memory mapper acquires an AI instruction corresponding to the third read/write request from the HBM and returns the AI instruction corresponding to the third read/write request to the AI processor;
and when the AI instruction in the HBM meets a preset condition, the second memory mapper acquires a subsequent AI instruction set of the AI instruction in the HBM from the flash memory, converts the subsequent AI instruction set into preset bit width data corresponding to the bit width of the AI processor and then moves the preset bit width data to the HBM.
CN202011056600.1A 2020-09-29 2020-09-29 Data processing method of multiprocessor system and multiprocessor system Pending CN112231269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011056600.1A CN112231269A (en) 2020-09-29 2020-09-29 Data processing method of multiprocessor system and multiprocessor system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011056600.1A CN112231269A (en) 2020-09-29 2020-09-29 Data processing method of multiprocessor system and multiprocessor system

Publications (1)

Publication Number Publication Date
CN112231269A true CN112231269A (en) 2021-01-15

Family

ID=74120885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011056600.1A Pending CN112231269A (en) 2020-09-29 2020-09-29 Data processing method of multiprocessor system and multiprocessor system

Country Status (1)

Country Link
CN (1) CN112231269A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226481A (en) * 2008-02-02 2008-07-23 上海华为技术有限公司 Method, device and system for loading field programmable gate array
CN102110072A (en) * 2009-12-29 2011-06-29 中兴通讯股份有限公司 Complete mutual access method and system for multiple processors
CN104750557A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Method and device for managing memories
CN107943727A (en) * 2017-12-08 2018-04-20 深圳市德赛微电子技术有限公司 A kind of high efficient DMA controller
CN110941395A (en) * 2019-11-15 2020-03-31 深圳宏芯宇电子股份有限公司 Dynamic random access memory, memory management method, system and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226481A (en) * 2008-02-02 2008-07-23 上海华为技术有限公司 Method, device and system for loading field programmable gate array
CN102110072A (en) * 2009-12-29 2011-06-29 中兴通讯股份有限公司 Complete mutual access method and system for multiple processors
CN104750557A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Method and device for managing memories
CN107943727A (en) * 2017-12-08 2018-04-20 深圳市德赛微电子技术有限公司 A kind of high efficient DMA controller
CN110941395A (en) * 2019-11-15 2020-03-31 深圳宏芯宇电子股份有限公司 Dynamic random access memory, memory management method, system and storage medium

Similar Documents

Publication Publication Date Title
KR102541302B1 (en) Flash-integrated high bandwidth memory appliance
US11733870B2 (en) Near-memory compute module
TWI699646B (en) Memory device, memory addressing method, and article comprising non-transitory storage medium
US10824574B2 (en) Multi-port storage device multi-socket memory access system
JP2013206474A (en) Memory device and method of operating the same
CN110941395B (en) Dynamic random access memory, memory management method, system and storage medium
US10102884B2 (en) Distributed serialized data buffer and a memory module for a cascadable and extended memory subsystem
US20140040541A1 (en) Method of managing dynamic memory reallocation and device performing the method
JP2021086611A (en) Energy-efficient compute-near-memory binary neural network circuits
JP2021043975A (en) Interface circuit, memory device, and operation method for the same
JP2018152112A (en) Memory device and method of operating the same
US20240103755A1 (en) Data processing system and method for accessing heterogeneous memory system including processing unit
US7743195B2 (en) Interrupt mailbox in host memory
CN113994314A (en) Extended memory interface
US10853255B2 (en) Apparatus and method of optimizing memory transactions to persistent memory using an architectural data mover
CN112231269A (en) Data processing method of multiprocessor system and multiprocessor system
US20220283962A1 (en) Storage controller managing completion timing, and operating method thereof
CN111177027B (en) Dynamic random access memory, memory management method, system and storage medium
US20140331006A1 (en) Semiconductor memory devices
US11093432B1 (en) Multi-channel DIMMs
CN115114186A (en) Techniques for near data acceleration for multi-core architectures
US10942672B2 (en) Data transfer method and apparatus for differential data granularities
US20200327049A1 (en) Method and system for memory expansion with low overhead latency
CN113490915A (en) Expanding memory operations
US6401151B1 (en) Method for configuring bus architecture through software control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination