CN112835772A - Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment - Google Patents

Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment Download PDF

Info

Publication number
CN112835772A
CN112835772A CN201911164810.XA CN201911164810A CN112835772A CN 112835772 A CN112835772 A CN 112835772A CN 201911164810 A CN201911164810 A CN 201911164810A CN 112835772 A CN112835772 A CN 112835772A
Authority
CN
China
Prior art keywords
calculation
cpu
acceleration ratio
gpu
acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911164810.XA
Other languages
Chinese (zh)
Inventor
汤文莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Industry Technology
Original Assignee
Nanjing Institute of Industry Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Industry Technology filed Critical Nanjing Institute of Industry Technology
Priority to CN201911164810.XA priority Critical patent/CN112835772A/en
Publication of CN112835772A publication Critical patent/CN112835772A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment

Abstract

A coarse-grained calculation acceleration ratio evaluation method and a coarse-grained calculation acceleration ratio evaluation system in a heterogeneous hardware environment can evaluate whether an acceleration result is obtained or not before actual calculation is executed, the expense of calculating first and then measuring is avoided, the acceleration ratio can be automatically evaluated in real time according to calculation context, compared with an experience and measurement mode, the method is more accurate and more efficient, dynamic scheduling of calculation is achieved, a module with the acceleration ratio is operated on a GPU, a module without the acceleration ratio is still operated on a CPU, the calculation capacity of heterogeneous hardware can be utilized to the maximum, and the system can achieve the best performance.

Description

Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment
Technical Field
The invention belongs to the field of high performance, relates to a method for evaluating a calculation acceleration ratio in a heterogeneous hardware calculation environment, and more particularly relates to a method for quickly evaluating the calculation acceleration ratio of the same code in CPU and GPU hardware in a mixed calculation scene of the CPU and the GPU.
Background
With the development of high-performance computing technology, besides a CPU, more and more coprocessors, such as a GPU, an FPGA, an embedded accelerator card, and the like, appear in computing equipment, and these coprocessors can accelerate a conventional program based on CPU computing, thereby improving the overall computing performance of a business system. In the field of high-performance parallel computing, the acceleration of computing can be realized without rewriting a CPU code into, for example, a GPU code, and often, because of the influence of factors such as the increase of overhead of data copying and a GPU computation scheduling mechanism, the running speed is reduced instead of a multi-threaded program running on a CPU being changed into a parallel program running on a GPU. The existing estimation mode of the calculation performance mainly depends on manual experience and actual test result measurement after code migration, and if the calculation effect of program migration can be estimated in advance, a lot of unnecessary work expenses can be reduced undoubtedly.
Disclosure of Invention
In view of the above situation, the present invention provides an innovative coarse-grained calculation acceleration ratio evaluation method, which can solve the above problems well. The system can rapidly judge whether the acceleration ratio exists or not when a certain module is transferred from the CPU to the GPU for calculation according to the method, and the acceleration ratio is approximate.
The invention discloses a coarse-grained calculation acceleration ratio evaluation method and system under a heterogeneous hardware environment, which are different from the existing calculation acceleration ratio evaluation method, can quickly and automatically estimate acceleration effects through an algorithm, can more efficiently schedule and calculate through quantized acceleration ratio results, and are calculation effect evaluation implementation means.
The method comprises the following steps:
step 1, acquiring basic attributes of heterogeneous hardware and calculation types of calculation modules.
Step 2, according to the calculation type of the calculation module: the data volume and the calculated volume are in a linear relation or an exponential relation, different linear evaluation algorithms or exponential evaluation algorithms are selected according to the calculation types, and the acceleration ratio of the calculation module is estimated by combining specific parameters in the following calculation.
The acceleration ratio calculation method in the step comprises the following steps: the acceleration ratio N ═ algorithm calculates the consumed time T (CPU) in the CPU/algorithm calculates the consumed time T (GPU) in the GPU, wherein T (GPU) ═ total amount of data IO S (InData + outData)/bus IO speed (PCIE) + T (CPU)/parallelism M. The difference between the linear evaluation algorithm and the exponential evaluation algorithm is mainly reflected in the relationship between the calculated amount and the data amount. For example, in a linear relationship, the calculation time period t (CPU) of 100M data on the CPU is t seconds, and in an exponential relationship (generally, a square relationship), the calculation time period t (CPU) of 100M data on the CPU is t seconds.
And 3, according to the acceleration ratio result, if the acceleration ratio is greater than 1, the calculation is transferred from the CPU to the GPU, and the calculation can be transferred from the CPU to the GPU. If the acceleration ratio is 1, the calculation is migrated from the CPU to the GPU without acceleration effect.
Compared with the prior art: the invention has the beneficial effects that:
1. whether the result is accelerated or not can be evaluated before the actual calculation is carried out, and the overhead of calculating and measuring each time is avoided.
2. The acceleration ratio can be automatically evaluated in real time according to the computational context, and the method is more accurate and efficient compared with the mode of experience and measurement.
3. And the dynamic scheduling of calculation is realized, the module with the speed-up ratio is operated on the GPU, and the module without the speed-up ratio is still operated on the CPU, so that the calculation capability of heterogeneous hardware can be maximally utilized, and the system can realize the optimal performance.
Drawings
FIG. 1 is a schematic diagram of a linear evaluation model in an embodiment with a parallelism of 64;
FIG. 2 is a schematic diagram of an embodiment of a linear evaluation model with an acceleration ratio of 1 at a parallelism of 64;
FIG. 3 is a schematic diagram of an example-based exponential evaluation model with a data size of 1GB and a parallelism of 64;
FIG. 4 is a schematic diagram of an index evaluation model in an embodiment, where the data size is 1GB, the parallelism is 64, and the acceleration ratio is 1;
Detailed Description
In this embodiment, the hardware used by the high-performance computing server is an Intel E52600 CPU, the GPU device is an NVIDIA GTX 1080Ti, the motherboard uses PCI-E3.0 specification, the GPU uses PCIE 8X slot, the theoretical transmission speed is 16GB/S, and the actual test speed is 12.8 GB/S. In this embodiment, a database system is used, and all the bottom operators of the database can support the CPU and the GPU, but it is necessary to determine whether to use the CPU operator or the GPU operator according to the acceleration ratio result calculated from the context of operation.
In this embodiment, SQL statement 1: select from Data _100G where id _ int <10, SQL statement 2: select a.id _ int from Data _100G a join Data _100M b on a.id _ int. The main calculation of statement 1 is the where comparison operation, and the calculation amount is linear with the data amount. The main calculation of statement 2 is Join operation, and the calculation amount and the data amount are in exponential (square) relation.
The method comprises the following specific implementation steps:
step 1, a database system supporting a CPU/GPU heterogeneous computing operator acquires parameters of a current hardware environment, such as PCIE bus transmission efficiency and performance ratio of a single-core CPU and a single-core GPU, and the parameters can be obtained through manual configuration or automatic program statistics. In this embodiment, the actual transmission speed of the PCIE3.0 bus is 12.8GB/S, and the performance ratio of the single-core CPU to the single-core GPU is 1.
The system simultaneously obtains the calculation categories of the calculation module (the bottom operator of the database in the database system), for example, the filter operator is a linear operator category, the join operator is an exponential operator category, and the like, and the categories of the operators are manually set in advance according to the calculation characteristics (the relationship between the data amount and the calculation amount) of the operators.
And 2, generating a specific acceleration ratio evaluation model according to software and hardware context, wherein in the embodiment, the size of output data is 10% of that of input data D, the output data is substituted into the previous parameters, the acceleration ratio N is T/(1.1 multiplied by D/12.8+ T/M), T is the calculation time consumption of a CPU of an algorithm with the same calculation complexity, D is the size of input data, M is the GPU parallelism, and the time consumption of every 100M of data on the CPU is T seconds. The linear evaluation algorithm is then: because T ═ T × D/0.1 ═ 10T × D, N ═ 10T × D/1.1 ═ D/12.8+10T × D/m ═ 10T/(1.1/12.8+10T/m) ═ 1/(1.1/128T +1/m), where T ranges from 0.001S to 10S and m ranges from 64 to 1024. The index evaluation algorithm is as follows: because T is equal to T × square (D/0.1) and 100T × D, N is equal to 100T × D/(1.1 × D/12.8+100T × D/m) and is equal to 100T/(1.1/12.8D +100T/m) and 1/(1.1/1280Dt +1/m), wherein T is in the range of 0.001S to 10S, m is in the range of 64 to 1024, and D is in the range of 0.1GB to 16 GB.
And 3, when the database system receives the SQL sentences, analyzing and decomposing the SQL sentences into an SQL physical execution plan, wherein the physical execution plan consists of bottom SQL operators, and the SQL physical execution plan is a mature technology. In this embodiment, the main computing operator after the SQL statement 1 is analyzed is a filter operator, and the main computing operator after the SQL statement 2 is analyzed is a join operator.
And 4, when the database system processes the SQL statement 1, if the filter operator is in a linear type, a linear evaluation algorithm is adopted, the parallelism m is set to be 64, and the parameter can be adjusted and set according to specific conditions. As shown in fig. 1, the limit acceleration ratio of the linear model in this embodiment is 60, and when it is determined whether the acceleration ratio is available, it is concerned whether the acceleration ratio is greater than 1, the calculation time of the filter operator hundred M data of the SQL statement 1 in this embodiment is 0.005 second in the CPU (this data is a statistical result, the statistical method is a general method, and mature technology), as shown in fig. 2, the acceleration ratio N is evaluated as 1/(1.1/(128 0.005) +1/64) to 0.576 according to the antecedent evaluation algorithm, which indicates that the GPU operator has no acceleration effect in this case, and therefore, the database system continues to use the filter CPU operator to complete the calculation.
And step 5, when the database system processes the SQL statement 2, if the join operator is an exponential type, an exponential evaluation algorithm is adopted, and because the NVIDIA 1080Ti video memory size is 11G, the size of data D sent to the GPU for calculation is set to be 1GB, the parallelism m is set to be 64, and the parameter can be adjusted and set according to specific conditions. As shown in fig. 3, in the present embodiment, the limit acceleration ratio of the linear model is between 60 and 70, and when it is determined whether the acceleration ratio is available, it is concerned whether the acceleration ratio is greater than 1, the calculation time of the Join operator hundred M data of the SQL statement 2 in the present embodiment in the CPU is 0.2 seconds (this data is a statistical result, the statistical method is a general method, and a mature technology), and as shown in fig. 2, the acceleration ratio is evaluated as N ═ 1/(1.1/(1280 × 0.2) +1/64) ═ 50.19 according to the advance evaluation algorithm, since the acceleration ratio N >1, which indicates that the GPU operator has a good acceleration effect, the database system uses the Join operator to complete the calculation.
Particularly, the invention relates to a coarse-grained calculation acceleration ratio evaluation method under a heterogeneous hardware environment, the technical form of the implementation is not limited to a single algorithm, and the method can also be embedded into a hardware system for implementation, and the basic principle is consistent.
Of course, the present invention may have other embodiments, and those skilled in the art may make various changes and modifications according to the present invention, such as those applied to big data processing systems and those applied to AI hybrid computing execution scenarios, without departing from the spirit and scope of the present invention, and such changes and modifications should fall within the protection scope of the appended claims.

Claims (1)

1. A coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment are characterized in that: the method comprises the following steps:
step 1, acquiring basic attributes of heterogeneous hardware and calculation types of calculation modules;
step 2, according to the calculation type of the calculation module: the data volume and the calculated volume are in a linear relation or an exponential relation, different linear evaluation algorithms or exponential evaluation algorithms are selected according to the calculation types, and the acceleration ratio of the calculation module is estimated by combining the specific parameters of the upper part and the lower part of the calculation;
the acceleration ratio calculation method in the step comprises the following steps: the acceleration ratio N ═ algorithm calculates the consumed time T (CPU) in the CPU/algorithm calculates the consumed time T (GPU) in the GPU, wherein T (GPU) ═ total amount of data IO S (InData + outData)/bus IO speed (PCIE) + T (CPU)/parallelism M; the difference between the linear evaluation algorithm and the exponential evaluation algorithm is mainly reflected in the relationship between the calculated amount and the data amount; for example, in a linear relationship, if the calculation time length t (CPU) of 100M data on the CPU is t seconds, then in an exponential relationship (generally, a square relationship), the calculation time length t (CPU) of 100M data on the CPU is t seconds;
step 3, according to the acceleration ratio result, if the acceleration ratio is greater than 1, it is indicated that the calculation is migrated from the CPU to the GPU and has an acceleration effect, and the calculation can be migrated from the CPU to the GPU; if the acceleration ratio is 1, the calculation is migrated from the CPU to the GPU without acceleration effect.
CN201911164810.XA 2019-11-25 2019-11-25 Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment Pending CN112835772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911164810.XA CN112835772A (en) 2019-11-25 2019-11-25 Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911164810.XA CN112835772A (en) 2019-11-25 2019-11-25 Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment

Publications (1)

Publication Number Publication Date
CN112835772A true CN112835772A (en) 2021-05-25

Family

ID=75922784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911164810.XA Pending CN112835772A (en) 2019-11-25 2019-11-25 Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment

Country Status (1)

Country Link
CN (1) CN112835772A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900064A (en) * 2014-11-19 2016-08-24 华为技术有限公司 Method and apparatus for scheduling data flow task
CN105911532A (en) * 2016-06-29 2016-08-31 北京化工大学 Synthetic aperture radar echo parallel simulation method based on depth cooperation
CN107657599A (en) * 2017-08-07 2018-02-02 北京航空航天大学 Remote sensing image fusion system in parallel implementation method based on combination grain division and dynamic load balance
CN110413776A (en) * 2019-07-01 2019-11-05 武汉大学 It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900064A (en) * 2014-11-19 2016-08-24 华为技术有限公司 Method and apparatus for scheduling data flow task
CN105911532A (en) * 2016-06-29 2016-08-31 北京化工大学 Synthetic aperture radar echo parallel simulation method based on depth cooperation
CN107657599A (en) * 2017-08-07 2018-02-02 北京航空航天大学 Remote sensing image fusion system in parallel implementation method based on combination grain division and dynamic load balance
CN110413776A (en) * 2019-07-01 2019-11-05 武汉大学 It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张明 等: "度CPU+多GPU协同计算的三维泊松方程求解", 《小型微型计算机系统》 *
汤文莉: "5G时代校园网安全技术发展方向探究", 《信息记录材料》 *
韩思旭 等: "基于CUDA并行计算的大地电磁二维有限元数值模拟研究", 《地球物理学进展》 *

Similar Documents

Publication Publication Date Title
US7568028B2 (en) Bottleneck detection system, measurement object server, bottleneck detection method and program
US8612805B2 (en) Processor system optimization supporting apparatus and supporting method
CN112433819A (en) Heterogeneous cluster scheduling simulation method and device, computer equipment and storage medium
CN113038302B (en) Flow prediction method and device and computer storage medium
CN106951322A (en) The image collaboration processing routine acquisition methods and system of a kind of CPU/GPU isomerous environments
CN102222034A (en) Virtualized platform performance evaluating method based on program contour analysis
CN111563014A (en) Interface service performance test method, device, equipment and storage medium
CN114816721B (en) Multitask optimization scheduling method and system based on edge calculation
US20110106519A1 (en) Simulating an application
CN107943579B (en) Resource bottleneck prediction method, device, system and readable storage medium
CN116126346B (en) Code compiling method and device of AI model, computer equipment and storage medium
CN106202145A (en) A kind of preprocessing of remote sensing images system of Data-intensive computing
CN115150471B (en) Data processing method, apparatus, device, storage medium, and program product
CN111045912B (en) AI application performance evaluation method, device and related equipment
CN112365070A (en) Power load prediction method, device, equipment and readable storage medium
CN113268403A (en) Time series analysis and prediction method, device, equipment and storage medium
Anderson Emulating volunteer computing scheduling policies
CN116090552A (en) Training and reasoning performance test method for artificial intelligent accelerator card product
CN110569170A (en) method, device and equipment for evaluating utilization rate of server and storage medium thereof
CN111061547B (en) Task scheduling method and system for heterogeneous system
CN112835772A (en) Coarse-grained calculation acceleration ratio evaluation method and system under heterogeneous hardware environment
CN110377525B (en) Parallel program performance prediction system based on runtime characteristics and machine learning
CN117135131A (en) Task resource demand perception method for cloud edge cooperative scene
CN113641674B (en) Self-adaptive global sequence number generation method and device
CN113688125B (en) Abnormal value detection method and device based on artificial intelligence, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination