CN117687779B - Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform - Google Patents

Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform Download PDF

Info

Publication number
CN117687779B
CN117687779B CN202311623108.1A CN202311623108A CN117687779B CN 117687779 B CN117687779 B CN 117687779B CN 202311623108 A CN202311623108 A CN 202311623108A CN 117687779 B CN117687779 B CN 117687779B
Authority
CN
China
Prior art keywords
calculation
wave propagation
cpu
propagation prediction
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311623108.1A
Other languages
Chinese (zh)
Other versions
CN117687779A (en
Inventor
饶艳
刘万先
陈新月
柳尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Chengquan Information Technology Co ltd
Original Assignee
Shandong Chengquan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Chengquan Information Technology Co ltd filed Critical Shandong Chengquan Information Technology Co ltd
Priority to CN202311623108.1A priority Critical patent/CN117687779B/en
Publication of CN117687779A publication Critical patent/CN117687779A/en
Application granted granted Critical
Publication of CN117687779B publication Critical patent/CN117687779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/161Computing infrastructure, e.g. computer clusters, blade chassis or hardware partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform, and relates to the technical field of wireless communication. Compared with the prior art, the method improves the problem that the CPU cannot meet the real-time requirement of calculation in the face of large-scale electric wave propagation prediction task under a complex environment, and distributes different types of calculation tasks to different cores, so that the respective calculation capacity is better utilized, and the calculation efficiency is greatly improved; the large-scale intensive computing tasks are parallelized through heterogeneous multi-core computing, so that the computing time is greatly shortened, the computing cost is reduced, and the resource waste phenomena such as idle computing resources, repeated computing and the like are avoided through reasonable task allocation. According to different application scenes, the allocation and the utilization of the computing resources are optimized so as to adapt to different computing requirements and improve the overall performance of the system.

Description

Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
Technical Field
The invention relates to the technical field of wireless communication, in particular to a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform.
Background
In the traditional wave propagation prediction scheme, the wave propagation scene is relatively simple, the calculation tasks are relatively few, the calculation can be generally completed in a single node, and the calculation capability of a CPU is enough to support. However, with the increasing complexity of electromagnetic environment, the real-time requirement of the electric wave propagation prediction calculation is difficult to be met by the traditional single-machine single-node and the computing capability of the CPU in the face of large-scale complex electric wave propagation prediction task. With the increasing demands of computer applications on the computational power of processors and the continuous diversification of computing demands in recent years, processors are rapidly developed, and the parallel computing capability of chips is enhanced from single core to multi-core, so that the performance is remarkably improved. However, due to the limitations of the conventional CPU architecture, the conventional processor suffers from a serious power consumption bottleneck, and when the computer system needs to process a large amount of data and complex computing tasks, the increase of the number of cores cannot continue to bring about the improvement of the computing capability. The traditional solution still fully utilizes the multi-core of the CPU to improve the calculation efficiency, but can have hundreds or even thousands of calculation tasks facing to large-scale electric wave propagation prediction tasks in complex environments, and at the moment, the CPU cannot meet the real-time requirement of calculation. In this context, heterogeneous systems of interconnected CPUs and acceleration devices have emerged to further increase computing power. In modern computer systems, acceleration devices (e.g., GPUs) typically have massive parallel computing capabilities and computing acceleration capabilities in the professional world. Heterogeneous multi-core computing architectures can combine different types of computing devices to work cooperatively to achieve more efficient massively parallel computing.
In the current increasingly complex network and electromagnetic fields, the wave propagation environment covers four large spaces of sea, land, air and heaven, wave bands from ultra-long waves to millimeter waves are various, wave propagation prediction models are various, calculation tasks are large, and the execution efficiency of a complex wave propagation prediction scheme is particularly critical. In consideration of the fact that the large-scale electric wave propagation prediction calculation tasks are independent from each other, the predictions and the calculations at different spatial positions have no dependency relationship with each other, the propagation prediction models are simple in input and output, small in data size, suitable for parallelization calculation, negligible in improvement of overall calculation efficiency compared with data transmission delay, and suitable for achieving the requirement of rapid calculation by adopting GPU acceleration equipment.
In order to solve the problems, the invention provides a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform, and aims to synthesize the advantages of the heterogeneous multi-core calculation platform such as a CPU (Central processing Unit), a GPU (graphics processing Unit) and the like, utilize the high-performance calculation architecture of the heterogeneous multi-core, and adopt a CPU+GPU heterogeneous multi-core programming mode to realize rapid calculation of complex electric wave propagation prediction so as to realize improvement of overall calculation efficiency and realization of millisecond-level instant interaction.
Disclosure of Invention
The invention aims to provide a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform so as to solve the problems in the background art:
In the face of large-scale electric wave propagation prediction tasks in complex environments, a CPU cannot meet the real-time requirement of calculation.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform comprises the following steps:
S1: completing preparation work of electric wave propagation prediction in a CPU, and setting values of parameters required by electric wave propagation prediction calculation in response to a user parameter setting request;
s2: executing code in the process of wave propagation prediction in the CPU;
s3: a device parallel acceleration calculation electric wave propagation prediction calculation task;
s4: in the CPU, the subsequent serial calculation of the wave propagation prediction is continuously completed based on the calculation result of the wave propagation prediction calculation task until the wave propagation prediction calculation is finished, and index result data is returned;
s5: in the CPU, calling a memory recovery interface to release a memory space of a host end;
s6: and in the CPU, calling a memory recycling interface to release the memory space allocated for the device end.
Preferably, in S2, the codes are sequentially run in series.
Preferably, the specific step S3 is as follows:
s3.1: the method comprises the steps of carrying out accelerated wave propagation prediction calculation tasks which can be distributed to a GPU, distributing a memory of a host end in a CPU and initializing relevant input and output data;
S3.2: calling a memory allocation interface in the CPU to allocate the memory of the device end for the related input and output data;
s3.3: calling a data transmission interface in the CPU to copy relevant input and output data from a host end to a device end;
s3.4: setting corresponding grid and block dimensions in a CPU according to the operation tasks completed by each kernel function, and starting the corresponding kernel function;
S3.5: executing each kernel function in the GPU, and completing the appointed intensive operation task through parallel computation;
s3.6: finishing the processing of the kernel function calculation result in the GPU and storing the result according to a target format;
S3.7: and after the GPU acceleration operation is finished, a data transmission interface is called in the CPU to copy the calculation result data from the device end to the host end.
Preferably, the block is a thread block composed of a plurality of threads which cooperate with each other;
the grid is a thread grid formed by a plurality of thread blocks;
the kernel function is a CUDA kernel function, and the kernel function runs on grid.
Compared with the prior art, the invention provides a complex electric wave propagation prediction rapid calculation method based on a heterogeneous multi-core calculation platform, which has the following beneficial effects:
According to the invention, different types of computing tasks are distributed to different cores, so that the respective computing capacity is better utilized, and the computing efficiency is greatly improved; the large-scale intensive computing tasks are parallelized through heterogeneous multi-core computing, so that the computing time is greatly shortened, the computing cost is reduced, and the resource waste phenomena such as idle computing resources, repeated computing and the like are avoided through reasonable task allocation. According to different application scenes, the allocation and the utilization of the computing resources are optimized so as to adapt to different computing requirements and improve the overall performance of the system.
Drawings
FIG. 1 is a flow chart of the method mentioned in example 1 of the present invention;
fig. 2 is a schematic block diagram mentioned in embodiment 1 of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
According to the complex electric wave propagation prediction rapid calculation method based on the heterogeneous multi-core calculation platform, the advantages of the heterogeneous multi-core calculation platforms such as the CPU and the GPU are integrated, the rapid calculation of the complex electric wave propagation prediction is realized by utilizing the high-performance calculation architecture of the heterogeneous multi-core and adopting the CPU and GPU heterogeneous multi-core programming mode, so that the overall calculation efficiency is improved, and the millisecond-level instant interaction is realized. Specifically, the following are included.
The following terms are used herein:
CPU: the operation and control core of the computer system is the final execution unit for information processing and program running. The CPU structure mainly comprises an arithmetic unit, a control unit, a register, a cache and buses for data, control and state communication between the arithmetic unit, the control unit, the register and the cache. The computing unit mainly performs arithmetic operations, shifting and other operations, and address operations and conversion; the storage unit is mainly used for storing data, instructions and the like generated in operation; the control unit decodes the instructions and issues control signals for the various operations to be performed to complete each instruction. Since the architecture of the CPU requires a large amount of space to place the storage unit and the control unit, the computing unit occupies only a small portion in comparison, so that it is extremely limited in the massively parallel computing capability and is more good at logic control.
GPU: referring to programmable microprocessors that operate on image-oriented operations, modern GPUs are typically based on highly parallel internal hardware structures that are custom designed for graphics processing operations. The GPU can meet the parallel computing requirement in computer graphics, has hundreds or even thousands of cores, can process a large number of computing tasks simultaneously, is suitable for the fields of parallel computing intensive tasks such as image processing, machine learning, scientific computing and the like, and has the advantages of high parallelism, high throughput and low power consumption.
CUDA: the parallel computing platform and programming model of the NVIDIA invention are extensions to popular C programming languages that can utilize heterogeneous computing systems that include CPUs and massively parallel GPUs. For CUDA programmers, the computing system consists of a host, which is a conventional CPU, and one or more devices, which are processors with a large number of arithmetic units. In GPU-accelerated applications, the logical execution portion of the workload runs on the CPU, while the computationally intensive portion of the application runs in parallel on thousands of GPU cores. Using CUDA, the developer can program using popular languages (e.g., C/C++/FORTRAN/Python/MATLAB).
Host: refer to the CPU and its memory.
Device: refers to a GPU and its memory.
Example 1:
Referring to fig. 1-2, the method for fast calculating complex wave propagation prediction based on heterogeneous multi-core computing platform of the present invention includes:
s1: completing preparation work of electric wave propagation prediction in a CPU, and setting values of parameters required by electric wave propagation prediction calculation in response to a user parameter setting request; the method comprises the following steps:
firstly, the preparation work of wave propagation prediction is completed in a CPU, and the specific hardware configuration is as follows in response to parameter setting requests of users about scenes, stations, models and the like:
GPU device is NVIDIA GeForce GTX
1650 Graphics card with 4GB video memory and 896 CUDA cores; the CPU model is Intel (R) Core (TM) i5-10500H processor, 6 cores, 12 logic processors, and the main frequency is 2.50GHz; the operating system is a 64-bit Windows 10 family Chinese version. And setting values of parameters required by prediction calculation of wave propagation in the process of transmitting the electric waves to a plurality of receivers by the transmitter: taking the ITU-R p.528 model as an example, the prediction calculation of the radio wave propagation is performed at 20-degree azimuth intervals, 2-degree elevation intervals and 4-km distance steps in a 360-degree azimuth range, a 90-degree elevation range and a 320-km distance range.
S2: executing code in the process of wave propagation prediction in the CPU; the method comprises the following steps:
Code that is complex in logic and must be run sequentially in series in the process of performing wave propagation prediction in the CPU; such as the sequential operation codes of the transmitter before signal transmission or the sequential operation codes of the receiver during signal receiving, including the signal generation codes of the transmitter or the signal receiving and feedback codes of the receiver.
S3: a device parallel acceleration calculation electric wave propagation prediction calculation task; the method comprises the following steps:
In the process of transmitting signals from a transmitter to a receiver, the transmitter and the radio wave propagation prediction calculation of each receiver point position, such as signal transmission loss in the process of transmitting signals from the transmitter to each receiver point position or signal intensity received by the receiver at each point position, are respectively calculated in parallel by a device; firstly, distributing a memory of a host end in a CPU and initializing related input and output data; then, the memory of the device end is allocated for the related data by calling a memory allocation interface cudaMalloc (); calling a data transmission interface cudaMemcpy (), and copying the data from the host end to the device end; in CUDA, the CUDA program includes both host program and device program, and runs on the CPU and GPU, respectively. Communication between host and device is possible, and data copying is possible between them.
According to the characteristics of the operation task completed by each kernel function, setting the dimensions of the corresponding grid and block, and starting the corresponding kernel function; executing each kernel function in the GPU, and completing the appointed intensive operation task through parallel computation; finishing the processing of the kernel function calculation result in the GPU and storing the result according to a required format; the CUDA thread model has thread, block, grid from small to large. Wherein thread refers to threads, which are basic units in parallel; the block refers to a thread block, and consists of a plurality of threads which cooperate with each other; grid refers to a thread grid, which consists of a plurality of thread blocks. The kernel function refers to a CUDA kernel function, which is a kernel program executed on the GPU, and runs on a certain grid.
After the GPU acceleration operation is finished, a data transmission interface cudaMemcpy (), and then the calculation result data is copied from the device end back to the host end.
The wave propagation prediction on the CPU takes about 830 seconds, the data copying from the CPU to the GPU takes about 8.507 ns, the wave propagation prediction on the GPU takes about 105 ms, and the resulting data copying from the GPU to the CPU takes about 9.280 ns. The data copy quantity between the CPU and the GPU is small, the time consumption of the copying process is relatively negligible, and the feasibility of the method is verified.
S4: in the CPU, the subsequent serial calculation of the wave propagation prediction is continuously completed based on the calculation result of the wave propagation prediction calculation task until the wave propagation prediction calculation is finished, and index result data is returned; the method comprises the following steps:
in the CPU, based on the calculation result of the radio wave propagation prediction calculation task, continuing to complete the subsequent serial calculation of the radio wave propagation prediction between the transmitter and each receiver until the radio wave propagation prediction calculation between the transmitter and each receiver is finished, and returning index result data such as field intensity, basic transmission loss and the like between the transmitter and each receiver;
S5: in the CPU, calling a memory recovery interface to release a memory space of a host end; the method comprises the following steps:
Namely, the memory space of the CPU end is released by calling the memory recycling interface, wherein, the synchronous recycling or the asynchronous recycling can be carried out by calling the memory recycling interface,
S6: and in the CPU, calling a memory recycling interface to release the memory space allocated for the device end. The method comprises the following steps:
And calling a memory reclamation interface cudaFree () through a synchronous reclamation or asynchronous reclamation strategy to release the memory space allocated by the GPU side.
Referring to fig. 2, the heterogeneous multi-core computing system in this embodiment is composed of a host, that is, a conventional CPU, which is an operation and control core of the computer system and is a final execution unit for information processing and program execution, and one or more acceleration devices. The acceleration device is a GPU processor with a large number of computing units. Heterogeneous multi-core computing is that processors with different architectures cooperate with each other to complete computing tasks, a CPU is responsible for program flow control of a propagation model overall, a GPU is responsible for a specific small part of intensive computing tasks, and after each thread of the GPU completes the computing tasks, results obtained by the GPU computing are copied to a CPU end to complete one-time electric wave propagation prediction computing task.
In a specific propagation model used in the electric wave propagation prediction, parallelization design and optimization of intensive operations can be performed as required, and calculation is further accelerated from the propagation model level. For example, large-scale tensor multiplication operation can be realized by a line parallel or column parallel design, and the tensor multiplication operation is sent into a GPU in a blocking manner to carry out parallel operation, so that the operation efficiency is accelerated. For another example, ray tracing, tracing calculation of all rays can be mapped onto a programming model of the CUDA, and tracing of each ray is completed by using parallel computing capability of the GPU.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (3)

1. The complex electric wave propagation prediction rapid calculation method based on the heterogeneous multi-core calculation platform is characterized by comprising the following steps of:
S1: completing preparation work of electric wave propagation prediction in a CPU, and setting values of parameters required by electric wave propagation prediction calculation in response to a user parameter setting request;
s2: executing code in the process of wave propagation prediction in the CPU;
s3: a device parallel acceleration calculation electric wave propagation prediction calculation task;
s4: in the CPU, the subsequent serial calculation of the wave propagation prediction is continuously completed based on the calculation result of the wave propagation prediction calculation task until the wave propagation prediction calculation is finished, and index result data is returned;
s5: in the CPU, calling a memory recovery interface to release a memory space of a host end;
S6: in the CPU, calling a memory recovery interface to release a memory space allocated for a device end;
The specific steps of the S3 are as follows:
s3.1: the method comprises the steps of carrying out accelerated wave propagation prediction calculation tasks which can be distributed to a GPU, distributing a memory of a host end in a CPU and initializing relevant input and output data;
S3.2: calling a memory allocation interface in the CPU to allocate the memory of the device end for the related input and output data;
s3.3: calling a data transmission interface in the CPU to copy relevant input and output data from a host end to a device end;
s3.4: setting corresponding grid and block dimensions in a CPU according to the operation tasks completed by each kernel function, and starting the corresponding kernel function;
S3.5: executing each kernel function in the GPU, and completing the appointed intensive operation task through parallel computation;
s3.6: finishing the processing of the kernel function calculation result in the GPU and storing the result according to a target format;
S3.7: and after the GPU acceleration operation is finished, a data transmission interface is called in the CPU to copy the calculation result data from the device end to the host end.
2. The rapid calculation method for complex wave propagation prediction based on heterogeneous multi-core computing platform according to claim 1, wherein in S2, the codes are sequentially run in series.
3. The rapid calculation method for complex wave propagation prediction based on heterogeneous multi-core computing platform according to claim 1, wherein the block is a thread block composed of a plurality of threads cooperating with each other;
the grid is a thread grid formed by a plurality of thread blocks;
the kernel function is a CUDA kernel function, and the kernel function runs on grid.
CN202311623108.1A 2023-11-30 2023-11-30 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform Active CN117687779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311623108.1A CN117687779B (en) 2023-11-30 2023-11-30 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311623108.1A CN117687779B (en) 2023-11-30 2023-11-30 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Publications (2)

Publication Number Publication Date
CN117687779A CN117687779A (en) 2024-03-12
CN117687779B true CN117687779B (en) 2024-04-26

Family

ID=90125660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311623108.1A Active CN117687779B (en) 2023-11-30 2023-11-30 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Country Status (1)

Country Link
CN (1) CN117687779B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014139851A (en) * 2014-05-08 2014-07-31 Fujitsu Ltd Multi-core processor system, control method of multi-core processor system and control program of multi-core processor system
CN106874113A (en) * 2017-01-19 2017-06-20 国电南瑞科技股份有限公司 A kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+
CN108460195A (en) * 2018-02-08 2018-08-28 国家海洋环境预报中心 Tsunami mathematical calculation model is based on rapid implementation method parallel GPU
CN109857543A (en) * 2018-12-21 2019-06-07 中国地质大学(北京) A kind of streamline simulation accelerated method calculated based on the more GPU of multinode
CN112986944A (en) * 2021-03-04 2021-06-18 西安电子科技大学 CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method
CN114356550A (en) * 2021-12-10 2022-04-15 武汉大学 Three-level parallel middleware-oriented automatic computing resource allocation method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014139851A (en) * 2014-05-08 2014-07-31 Fujitsu Ltd Multi-core processor system, control method of multi-core processor system and control program of multi-core processor system
CN106874113A (en) * 2017-01-19 2017-06-20 国电南瑞科技股份有限公司 A kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+
CN108460195A (en) * 2018-02-08 2018-08-28 国家海洋环境预报中心 Tsunami mathematical calculation model is based on rapid implementation method parallel GPU
CN109857543A (en) * 2018-12-21 2019-06-07 中国地质大学(北京) A kind of streamline simulation accelerated method calculated based on the more GPU of multinode
CN112986944A (en) * 2021-03-04 2021-06-18 西安电子科技大学 CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method
CN114356550A (en) * 2021-12-10 2022-04-15 武汉大学 Three-level parallel middleware-oriented automatic computing resource allocation method and system

Also Published As

Publication number Publication date
CN117687779A (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN109213723B (en) Processor, method, apparatus, and non-transitory machine-readable medium for dataflow graph processing
CA3107337C (en) Accelerating dataflow signal processing applications across heterogeneous cpu/gpu systems
JP2021057011A (en) Apparatus and method for real time graphics processing using local and cloud-based graphics processing resource
JP2966085B2 (en) Microprocessor having last-in first-out stack, microprocessor system, and method of operating last-in first-out stack
CN109993684A (en) Compression in machine learning and deep learning processing
US9753726B2 (en) Computer for amdahl-compliant algorithms like matrix inversion
US11816061B2 (en) Dynamic allocation of arithmetic logic units for vectorized operations
Song et al. Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small
Huang et al. IECA: An in-execution configuration CNN accelerator with 30.55 GOPS/mm² area efficiency
Fowers et al. Inside Project Brainwave's Cloud-Scale, Real-Time AI Processor
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
CN117687779B (en) Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
CN110704193B (en) Method and device for realizing multi-core software architecture suitable for vector processing
CN111008042B (en) Efficient general processor execution method and system based on heterogeneous pipeline
CN116090519A (en) Compiling method of convolution operator and related product
CN113377534A (en) High-performance sparse matrix vector multiplication calculation method based on CSR format
Zhou et al. Reconfigurable instruction-based multicore parallel convolution and its application in real-time template matching
Chen et al. Parallel Computing Optimization for Ground-based TT&C Network Situational Data Processing
CN114595813B (en) Heterogeneous acceleration processor and data computing method
US11714649B2 (en) RISC-V-based 3D interconnected multi-core processor architecture and working method thereof
Ligon et al. Evaluating multigauge architectures for computer vision
Ohmura et al. Computation-communication overlap of linpack on a GPU-accelerated PC cluster
Li Embedded AI Accelerator Chips
Yan et al. A reconfigurable processor architecture combining multi-core and reconfigurable processing units
CN116775283A (en) GPGPU resource allocation management method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant