CN107908477A - A kind of data processing method and device for radio astronomy data - Google Patents

A kind of data processing method and device for radio astronomy data Download PDF

Info

Publication number
CN107908477A
CN107908477A CN201711148902.XA CN201711148902A CN107908477A CN 107908477 A CN107908477 A CN 107908477A CN 201711148902 A CN201711148902 A CN 201711148902A CN 107908477 A CN107908477 A CN 107908477A
Authority
CN
China
Prior art keywords
data
thread
data processing
instructed
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711148902.XA
Other languages
Chinese (zh)
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201711148902.XA priority Critical patent/CN107908477A/en
Publication of CN107908477A publication Critical patent/CN107908477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present invention provides a kind of data processing method and device for radio astronomy data, wherein, the data processing method includes outermost loop processing procedure, intermediate layer circulating treatment procedure and innermost loop processing procedure, further comprising the steps of:The calculation amount of each iteration in the outermost loop processing procedure is distributed into different threads;Each thread is instructed using vectorization.The embodiment of the present invention is allocated the calculation amount of each iteration in circulation by the method for multithreading task scheduling (schedule), improve the harmony of the computational load of each thread, effectively optimizing has been carried out to deGridding, has greatly improved performance.

Description

A kind of data processing method and device for radio astronomy data
Technical field
The invention belongs to computer realm, more particularly to a kind of data processing method and dress for radio astronomy data Put.
Background technology
International Astronomical project " square kilometer array " astronomical telescope (SKA, Square Kilometer Array).This Mesh is intended to build aperture synthesis radio astronomical telescope the biggest in the world, possess 3000 diameters, 15 meters of parabola butterfly antennas and 250 groups of intermediate frequencies and low frequency array of apertures, distribution are more than 3000 kilometers, about 1 square kilometre of the ray-collecting area gross area, it is contemplated that Sensitivity than current maximum radio telescope arrays (JVLA) improves about 50 times, and maximum single port footpath radio more current than China is hoped The sensitivity of remote mirror (FAST) improves about 10000 times.According to plan, the data volume of SKA collections per second is more than 12Tb, it is necessary to almost The performance summation of all supercomputers of TOP500 could complete the processing work of its data volume.
DeGridding is that calculation procedure is most complicated in SKA, takes most data processing links, is approached in whole project 30% data need to be handled by the software.Degridding, which calculates core, includes three calculating circulations, and outermost layer follows Ring is that dind calculates circulation, and calculation amount is nChan × nSamples, and wherein nSamples is data sample number, and nChan is spectrum Port number;Intercycle is that suppv calculates circulation, and calculation amount is the length of X (/Y) axis of convolution kernel;Innermost loop suppu Circulation is calculated, calculation amount is the length of Y (/X) axis of convolution kernel.At present, the serial process version speed of deGridding can not Reach perfect condition, therefore, as can carrying out effectively optimizing to deGridding, SKA project data processing links will be greatly reduced Investment in terms of calculating platform.
The content of the invention
The embodiment of the present invention provides a kind of data processing method and device for radio astronomy data, to solve above-mentioned ask Topic.
The embodiment of the present invention provides a kind of data processing method for radio astronomy data.The data processing method bag Outermost loop processing procedure, intermediate layer circulating treatment procedure and innermost loop processing procedure are included, it is further comprising the steps of:Will The calculation amount of each iteration distributes to different threads in the outermost loop processing procedure;Each thread uses vector Change instruction.
The embodiment of the present invention also provides a kind of data processing equipment for radio astronomy data, for radio astronomy data Data processing, the data processing includes outermost loop processing procedure, intermediate layer circulating treatment procedure and innermost loop Processing procedure, the data processing equipment include:
Data allocation unit, for the calculation amount of each iteration in the outermost loop processing procedure to be distributed to calculating Different thread in unit;Computing unit, each thread is instructed when calculating using vectorization in the computing unit.
The embodiment of the present invention passes through calculating of the method for multithreading task scheduling (schedule) to each iteration in circulation Amount is allocated, and improves the harmony of the computational load of each thread, is instructed by simd instructions and _ mm_prefetch so that The vectorization of core calculations part, and the data for participating in calculating are stored in caching in advance, improve what is be written and read from memory Efficiency is right using AVX512 instruction set and MCDRAM cache the significant increases computing capability of deGridding programs DeGridding has carried out effectively optimizing, greatly improves performance, and practicality is stronger, and the scope of application is wider.
Brief description of the drawings
Attached drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 show the data processing method process chart for radio astronomy data of the embodiment of the present invention 1;
Fig. 2 show the abstract representation schematic diagram of the vectorization processing procedure of the embodiment of the present invention 1;
Fig. 3 show the vectorization operation specific implementation schematic diagram of the embodiment of the present invention 1;
Fig. 4 show the data processing equipment structure chart for radio astronomy data of the embodiment of the present invention 2.
Embodiment
Come that the present invention will be described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that do not conflicting In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
Fig. 1 show the data processing method process chart for radio astronomy data of the embodiment of the present invention 1, described Data processing method includes outermost loop processing procedure, intermediate layer circulating treatment procedure and innermost loop processing procedure, also Comprise the following steps:
Step 102:The calculation amount of each iteration in the outermost loop processing procedure is distributed into different threads;
Step 104:Each thread is instructed using vectorization.
In above-mentioned steps 102, the calculation amount is distributed to not using the schedule clause of OpenMP parallel constructions Same thread;Dynamic dispatching is carried out to iterative calculation using the dynamic dispatching dynamic in schedule.
Specifically, the number using the schedule clause of OpenMP parallel constructions by calculation amount for nChan × nSamples According to different threads is distributed to, for the unbalanced situation of computational load in circulation, avoid causing mutually to wait between thread, Operating status and system resource are based on using the dynamic dispatching dynamic in schedule, and dynamic dispatching is carried out to iteration.Pass through The method of multithreading task scheduling (schedule) is allocated the calculation amount of each iteration in circulation, effectively prevent thread Between data dependency, improve the harmony of the computational load of each thread.When optimizing to performance, it is necessary in memory optimization profit Compromised between optimization load balance, the method that can obtain optimum is found by the measurement to performance.Use One internal queues, when thread can use, is distributed as a certain number of loop iterations specified by block size, due to single-unit for it 64 cores are included in point (node), the Thread Count of each core is 1, and in the case of without using hyperthread, thread maximum is set Quantity is put as 64, works as np=8, during OMP_NUM_THREADS=8, for nChan × nSamples=800,000 data sample This amount needs to be divided into 64 pieces (800000/64=12500/thread).
In above-mentioned steps 104, on the premise of the dependence correctness for ensureing to be quantified between variable, # is used Pragma simd effectively realize cyclic vector.On machine for supporting the extension of 512bit vector gather instructions, compiler life Carry out the cyclic part in vectorizer into corresponding instruction.Fig. 2 is the abstract representation of vectorization processing procedure, wherein employing Individually operation handles vector (vector), there is provided the mode of the data parallel more highly efficient than scalar.VL in figure Vector length is represented, wherein the scalar (such as int, floate type) comprising multiple same data types.Fig. 3 grasps for vectorization Implement, when specified vectorlength (8), theoretical last time equivalent to 8 times scalar loops of vector circulant, due to Value types include real and imaginary parts, and sizeof (float)=8, and therefore, the length of each vector operations is (4 × 8) × 16=512bit, theoretic vector circulant number are sSize/16 times, and need to establish the private numbers that size is 16 Group, the numerical value data after multiply-add operation, program are carried out for preserving grid by the numerical value after convolution nuclear mapping, i.e. grid and C Vectorization unit can be made full use of to accelerate calculating speed, and result of calculation is stored in data_local.Meanwhile if I Do not prevent the loop unroll of compiler from optimizing plus pragma #pragma nounroll, compiler can be followed Ring expansion optimization, so actual cycle-index may be less.
Further, instructed using OpenMP simd and thread packet is carried out to the calculating operation in circulation;
Per thread scheduling performs several data blocks, and is instructed using simd come the circulation followed by performing.
I.e. using OpenMP simd instruct in for-loop calculating operation carry out thread packet, per thread according to OpenMP runtime schedulings perform several data blocks, per thread performed being instructed using simd followed by circulation, and Per thread is allowed to accelerate to circulate using vectorization instruction.
Further, the data processing method for radio astronomy data can also include:
Prefetched instruction is inserted into by compiler to prestore the data for participating in calculating to caching.
Specifically, copied using instruction _ mm_prefetch memory optimizations of SSE intrinsic, in actual access data Before just in advance the digital independent into caching.Function expression void_mm_prefetch (char const*a, int Sel), it correspond to PREFETCH instructions, tell processor that a corresponding cachings in address are loaded into the caching of more high speed, sel Give the type of pre- extract operation.Prefetched instruction and corresponding types are as shown in table 1, and wherein NTA represents to prefetch using non-provisional, energy Enough reduce the pollution of cache lines;T0 represents to fetch data into all cachings;T1 represents to be prefetched to L2, L3 cachings, but is less than L1 Caching;T2 represents only to fetch data into L3 cachings.Because program to carry out write operation or to access the cache lines multiple, therefore adopts With the mode for fetching data into all cachings.Specific code realizes that process is expressed as below, wherein passing through _ mm_ for grid and C Prefetch is prefetched respectively, it is contemplated that grid and C can transform to 2D storage forms, and multirow data are loaded into more high speed In caching, and carry out traveling through all elements prestored during corresponding multiplication operation.The choosing of PF3 and PF4 in _ mm_prefetch Taking mode to be obtained by the experiment shown in table 2, work as PF4=2, during PF3=1, data processing time is most short under single thread, That is grid and C carries out 2 rows respectively every time and the data prefetching performance of 1 row is optimal.
PREFETCHINTA _MM_HINT_NTA
PREFETCH0 _MM_HINT_T0
PREFETCH1 _MM_HINT_T1
PREFETCH2 _MM_HINT_T2
Table 1
Table 2
Further, MCDRAM is configured to cache mode, using the MCDRAM as L2 cache and DDR4 memories Between last level cache.
In addition, the embodiment of the present invention is compiled using Intel's AVX512 instruction set, it is greatly perfect existing Simd instruction set, to lift the calculated performance of program, wherein, VPU supports 512bit vector gather instructions in intel Xeon Phi Extension.
Therefore, the embodiment of the present invention by the method for multithreading task scheduling (schedule) to each iteration in circulation Calculation amount is allocated, and avoids the data dependency of cross-thread, and improve each thread calculates what is loaded during astronomical sample data It is harmonious;Core is calculated to deGridding and uses OpenMP parallelizations, the expansion of thread and merging are placed on outermost The circulation of side, and total amount of data is divided equally according to OpenMP number of threads, and write data to the unique memory headroom of cross-thread;Make Cyclic vector is effectively realized with simd, on the machine for supporting the extension of 512bit vector gather instructions, considers Xeon Phi The 512bit line widths of processor, make full use of MCDRAM according to length shared by single array, accelerate read or write speed;Can be same When support multiple independent data flows prefetch characteristic, array is accessed by expression formula a [j], it is pre- to be inserted into software by compiler Instruction fetch is loaded into a [j+d] in caching, and a corresponding cachings in return address are loaded into the caching of more high speed, improve journey The calculated performance of sequence, is greatly reduced the investment in terms of SKA project data processing links calculating platforms.
Fig. 4 show the data processing equipment structure chart for radio astronomy data of the embodiment of the present invention 2.
As shown in figure 4, a kind of data processing equipment for radio astronomy data according to embodiments of the present invention, for radio The data processing of chronometer data, the data processing include outermost loop processing procedure, intermediate layer circulating treatment procedure and most Interior loop processing procedure, the data processing equipment include:
Data allocation unit 402, for the calculation amount of each iteration in the outermost loop processing procedure to be distributed to Different thread in computing unit;
Computing unit 404, each thread is instructed when calculating using vectorization in the computing unit.
Further, the data allocation unit 402 using the schedule clause of OpenMP parallel constructions by the meter Calculation amount distributes to different threads, and using the dynamic dispatching dynamic in schedule to iterating to calculate into Mobile state tune Degree.
Further, the computing unit 404 is instructed using OpenMP simd and carries out thread to the calculating operation in circulation Packet, and per thread scheduling perform several data blocks, and are instructed using simd come the circulation followed by performing.
Further, the data processing equipment for radio astronomy data can also include:Pre-fetch unit 406, for leading to Compiler insertion prefetched instruction is crossed to prestore the data for participating in calculating to caching.
The pre-fetch unit 406 is additionally operable to MCDRAM being configured to cache mode, and the MCDRAM is delayed as two level Deposit the last level cache between DDR4 memories.
The embodiment of the present invention passes through calculating of the method for multithreading task scheduling (schedule) to each iteration in circulation Amount is allocated, and improves the harmony of the computational load of each thread, is instructed by simd instructions and _ mm_prefetch so that The vectorization of core calculations part, and the data for participating in calculating are stored in caching in advance, improve what is be written and read from memory Efficiency is right using AVX512 instruction set and MCDRAM cache the significant increases computing capability of deGridding programs DeGridding has carried out effectively optimizing, greatly improves performance, and practicality is stronger, and the scope of application is wider.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the invention, for the skill of this area For art personnel, the invention may be variously modified and varied.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data processing method for radio astronomy data, it is characterised in that the data processing method includes outermost Layer circulating treatment procedure, intermediate layer circulating treatment procedure and innermost loop processing procedure, it is further comprising the steps of:
The calculation amount of each iteration in the outermost loop processing procedure is distributed into different threads;
Each thread is instructed using vectorization.
2. according to the method described in claim 1, it is characterized in that, using the schedule clause of OpenMP parallel constructions by institute State calculation amount and distribute to different threads;
Dynamic dispatching is carried out to iterative calculation using the dynamic dispatching dynamic in schedule.
3. according to the method described in claim 2, it is characterized in that, the calculating in circulation is grasped using OpenMP simd instructions Make to carry out thread packet;
Per thread scheduling performs several data blocks, and is instructed using simd come the circulation followed by performing.
4. according to the method in any one of claims 1 to 3, it is characterised in that further include:
Prefetched instruction is inserted into by compiler to prestore the data for participating in calculating to caching.
5. according to the method described in claim 4, it is characterized in that, MCDRAM is configured to cache mode, by described in MCDRAM is as the last level cache between L2 cache and DDR4 memories.
6. a kind of data processing equipment for radio astronomy data, it is characterised in that at the data for radio astronomy data Reason, the data processing include outermost loop processing procedure, intermediate layer circulating treatment procedure and innermost loop processing procedure, The data processing equipment includes:
Data allocation unit, for the calculation amount of each iteration in the outermost loop processing procedure to be distributed to computing unit Middle different thread;
Computing unit, each thread is instructed when calculating using vectorization in the computing unit.
7. device according to claim 6, it is characterised in that the data allocation unit utilizes OpenMP parallel constructions The calculation amount is distributed to different threads by schedule clause, and utilizes the dynamic dispatching dynamic in schedule Dynamic dispatching is carried out to iterative calculation.
8. device according to claim 7, it is characterised in that the computing unit is instructed to following using OpenMP simd Calculating operation in ring carries out thread packet, and per thread scheduling performs several data blocks, and is instructed using simd Come the circulation followed by performing.
9. the device according to any one of claim 6 to 8, it is characterised in that further include:Pre-fetch unit, for passing through Compiler is inserted into prefetched instruction and prestores the data for participating in calculating to caching.
10. device according to claim 9, it is characterised in that the pre-fetch unit is additionally operable to be configured to delay by MCDRAM Pattern is deposited, using the MCDRAM as the last level cache between L2 cache and DDR4 memories.
CN201711148902.XA 2017-11-17 2017-11-17 A kind of data processing method and device for radio astronomy data Pending CN107908477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711148902.XA CN107908477A (en) 2017-11-17 2017-11-17 A kind of data processing method and device for radio astronomy data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711148902.XA CN107908477A (en) 2017-11-17 2017-11-17 A kind of data processing method and device for radio astronomy data

Publications (1)

Publication Number Publication Date
CN107908477A true CN107908477A (en) 2018-04-13

Family

ID=61846296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711148902.XA Pending CN107908477A (en) 2017-11-17 2017-11-17 A kind of data processing method and device for radio astronomy data

Country Status (1)

Country Link
CN (1) CN107908477A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509279A (en) * 2018-04-16 2018-09-07 郑州云海信息技术有限公司 A kind of processing method, device and storage medium for radio astronomy data
CN108874547A (en) * 2018-06-27 2018-11-23 郑州云海信息技术有限公司 A kind of data processing method and device of astronomy software Gridding
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN114661637A (en) * 2022-02-28 2022-06-24 中国科学院上海天文台 Data processing system and method for radio astronomical data intensive scientific operation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307655A1 (en) * 2008-06-10 2009-12-10 Keshav Kumar Pingali Programming Model and Software System for Exploiting Parallelism in Irregular Programs
CN104375838A (en) * 2014-11-27 2015-02-25 浪潮电子信息产业股份有限公司 OpenMP (open mesh point protocol) -based astronomy software Griding optimization method
CN104504257A (en) * 2014-12-12 2015-04-08 国家电网公司 Double parallel computing-based on-line Prony analysis method
CN105260175A (en) * 2015-09-16 2016-01-20 浪潮(北京)电子信息产业有限公司 Method for processing Gridding in astronomy software based on OpenMP
CN106020773A (en) * 2016-05-13 2016-10-12 中国人民解放军信息工程大学 Optimization Method of Finite Difference Algorithm in Heterogeneous Many-Core Architecture
CN106383961A (en) * 2016-09-29 2017-02-08 中国南方电网有限责任公司电网技术研究中心 Large vortex simulation algorithm optimization processing method under CPU + MIC heterogeneous platform
CN106598552A (en) * 2016-12-22 2017-04-26 郑州云海信息技术有限公司 Data point conversion method and device based on Gridding module
CN106897131A (en) * 2017-02-22 2017-06-27 郑州云海信息技术有限公司 A kind of parallel calculating method and its device for astronomical software Gridding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307655A1 (en) * 2008-06-10 2009-12-10 Keshav Kumar Pingali Programming Model and Software System for Exploiting Parallelism in Irregular Programs
CN104375838A (en) * 2014-11-27 2015-02-25 浪潮电子信息产业股份有限公司 OpenMP (open mesh point protocol) -based astronomy software Griding optimization method
CN104504257A (en) * 2014-12-12 2015-04-08 国家电网公司 Double parallel computing-based on-line Prony analysis method
CN105260175A (en) * 2015-09-16 2016-01-20 浪潮(北京)电子信息产业有限公司 Method for processing Gridding in astronomy software based on OpenMP
CN106020773A (en) * 2016-05-13 2016-10-12 中国人民解放军信息工程大学 Optimization Method of Finite Difference Algorithm in Heterogeneous Many-Core Architecture
CN106383961A (en) * 2016-09-29 2017-02-08 中国南方电网有限责任公司电网技术研究中心 Large vortex simulation algorithm optimization processing method under CPU + MIC heterogeneous platform
CN106598552A (en) * 2016-12-22 2017-04-26 郑州云海信息技术有限公司 Data point conversion method and device based on Gridding module
CN106897131A (en) * 2017-02-22 2017-06-27 郑州云海信息技术有限公司 A kind of parallel calculating method and its device for astronomical software Gridding

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509279A (en) * 2018-04-16 2018-09-07 郑州云海信息技术有限公司 A kind of processing method, device and storage medium for radio astronomy data
CN108874547A (en) * 2018-06-27 2018-11-23 郑州云海信息技术有限公司 A kind of data processing method and device of astronomy software Gridding
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN112740174A (en) * 2018-10-17 2021-04-30 北京比特大陆科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN112740174B (en) * 2018-10-17 2024-02-06 北京比特大陆科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN114661637A (en) * 2022-02-28 2022-06-24 中国科学院上海天文台 Data processing system and method for radio astronomical data intensive scientific operation

Similar Documents

Publication Publication Date Title
CN107168683B (en) GEMM dense matrix multiplication high-performance implementation method on Shenwei 26010 many-core CPU
Ionica et al. The movidius myriad architecture's potential for scientific computing
CN107908477A (en) A kind of data processing method and device for radio astronomy data
Leischner et al. GPU sample sort
Baskaran et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
Schäfer et al. High performance stencil code algorithms for GPGPUs
CN105808309B (en) A kind of high-performance implementation method of the basic linear algebra library BLAS three-level function GEMM based on Shen prestige platform
CN103336758A (en) Sparse matrix storage method CSRL (Compressed Sparse Row with Local Information) and SpMV (Sparse Matrix Vector Multiplication) realization method based on same
Matsumoto et al. Performance tuning of matrix multiplication in OpenCL on different GPUs and CPUs
Rojek et al. Adaptation of fluid model EULAG to graphics processing unit architecture
CN106415526A (en) FET processor and operation method
Podobas et al. Evaluating high-level design strategies on FPGAs for high-performance computing
Wang et al. Design and implementation of a highly efficient dgemm for 64-bit armv8 multi-core processors
CN113987414B (en) Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor
Chu et al. Efficient Algorithm Design of Optimizing SpMV on GPU
Dursun et al. In-Core Optimization of High-Order Stencil Computations.
Song et al. Gpnpu: Enabling efficient hardware-based direct convolution with multi-precision support in gpu tensor cores
Tang et al. Optimizing and auto-tuning iterative stencil loops for GPUs with the in-plane method
Tandri et al. Automatic partitioning of data and computations on scalable shared memory multiprocessors
Bandyopadhyay et al. GRS—GPU radix sort for multifield records
Li et al. A speculative HMMER search implementation on GPU
CN109522127A (en) A kind of fluid machinery simulated program isomery accelerated method based on GPU
CN106598552A (en) Data point conversion method and device based on Gridding module
Ries et al. Triangular matrix inversion on graphics processing unit
CN108509279A (en) A kind of processing method, device and storage medium for radio astronomy data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190227

Address after: 100085 Beijing Haidian District Shangdi Information Road 2-1 C Building 1 Floor

Applicant after: INSPUR (BEIJING) ELECTRONIC INFORMATION INDUSTRY Co.,Ltd.

Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180413