Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the problem that in the prior art, a memory is separately configured, in the process of accelerating power calculation of each CPU, data needs to be copied from a main memory to another memory of a video memory device, and after the calculation is completed, the calculation result is copied from the memory of the video memory device back to the main memory.
According to a first aspect, an embodiment of the present invention provides a power heterogeneous computing system, including: the integrated circuit comprises a main processor, a GPU, an accelerator and a unified memory, wherein the main processor, the GPU and the accelerator are arranged on a circuit board card in an integrated mode, the main processor, the GPU and the accelerator are all connected with the unified memory, and the main processor, the GPU and the accelerator uniformly access the unified memory according to a preset access mechanism.
In one embodiment, the host processor, the GPU, and the accelerator are respectively provided with a data cache chip corresponding to each other.
In one embodiment, the host processor comprises a RISC-V processor.
In one embodiment, the main processor further comprises an X86 processor.
In one embodiment, the main processor further comprises an ARM processor.
In one embodiment, mapping information from a virtual address to a physical address is set in the unified memory.
The technical scheme of the invention has the following advantages:
the invention provides a power heterogeneous computing system, which comprises: the circuit board card comprises a main processor, a GPU, an accelerator and a unified memory, wherein the main processor, the GPU and the accelerator are arranged on the circuit board card in an integrated mode, the main processor, the GPU and the accelerator are all connected with the unified memory, and the main processor, the GPU and the accelerator access the unified memory in a unified mode according to a preset access mechanism. The invention utilizes the unified memory to uniformly store the corresponding data of the main processor, the GPU and the accelerator, can solve the problem of time increase caused by copying data from a single memory back and forth, further can reduce the processing time of the system and the power consumption of the system, and obviously improves the computing efficiency of the data.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In the field of power data calculation, a memory is usually configured for each CPU individually, and because the memory is configured individually, in the process of accelerating power calculation of each CPU, data needs to be copied from a main memory to another memory of a video memory device, and after calculation is completed, a calculation result is copied from the memory of the video memory device back to the main memory.
In view of this, an embodiment of the present invention provides an electric heterogeneous computing system, as shown in fig. 1, including: the host processor 110, the GPU 111, the accelerator 112, and the unified memory 113 are integrally arranged on the circuit board 11, wherein the host processor 110, the GPU 111, and the accelerator 112 are all connected to the unified memory 113, and the host processor 110, the GPU 111, and the accelerator 112 access the unified memory 113 in a unified manner according to a preset access mechanism.
The circuit board card can be a PCB circuit board, the main processor preferably adopts a RISC-V processor, and because the RISC-V processor is an open source Instruction Set Architecture (ISA) based on the principle of Reduced Instruction Set (RISC), the processing speed of electric heterogeneous computation can be improved, and the system power consumption can be reduced. The main processor is used for processing operating software of factory standard configuration of the computer. And the GPU and the accelerator are used as auxiliary processors of the main processor, and when the main processor relates to data with larger calculation scale, the GPU and the accelerator can be used for assisting to complete calculation instructions. For example: when the computer utilizes the power simulation software to simulate data, the main processor processes and loads some related data which can be normally operated by the power simulation software, and large-scale data related to a system to be simulated can be distributed to the GPU and the accelerator corresponding to the main processor according to a preset distribution rule.
The unified memory is used for uniformly storing corresponding data of the main processor, the GPU and the accelerator. Then, the main processor, the GPU and the accelerator can access the unified memory uniformly according to a preset access mechanism without copying the respective cached data to a specific memory, then waiting for the data to be calculated, and then calling the calculation result from the memory.
In an implementation manner, in fig. 1, in the power heterogeneous computing system in the embodiment of the present invention, the host processor, the GPU, and the accelerator are respectively provided with corresponding data cache chips, which are respectively a first cache chip 1101, a second cache chip 1111, and a third cache chip 1121. The first cache chip is used for caching and updating data of the main processor and updating the data in the unified memory in time according to a preset access mechanism. The second cache chip is used for caching and updating the data of the GPU and updating the data in the unified memory device in time according to a preset access mechanism. The third cache chip is used for caching the data of the updating accelerator and updating the data in the unified memory in time according to a preset access mechanism.
In another implementation manner, in the power heterogeneous computing system in the embodiment of the present invention, the main processor further includes an X86 processor or an ARM processor. The X86 processor is a generic name of a microprocessor architecture first developed and manufactured by Intel, the X86 processor and ARM can also complete the corresponding data calculation of the computer, but the processing speed of the RISC-V processor is better than that of the X86 processor and the ARM processor. Therefore, the RISC-V processor is preferred as the master process by embodiments of the present invention.
In the embodiment of the invention, the GPU is used for providing required high-speed matrix operation for the computer system, including calculation of a deep learning model and the like.
In an implementation manner, in the power heterogeneous computing system in the embodiment of the present invention, mapping information from a virtual address to a physical address is set in the unified memory.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.