CN114490041A

CN114490041A - Array calculation method, device, equipment, medium and computer program product

Info

Publication number: CN114490041A
Application number: CN202111676480.XA
Authority: CN
Inventors: 朱鹏飞; 赵源; 梁智
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Dawning Information Industry Beijing Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-13

Abstract

The present application relates to a group calculation method, apparatus, device, medium and computer program product. The method comprises the following steps: calling a target kernel function, wherein the target kernel function is defined with a target local variable, the target local variable points to the calculation result of the first array, the target kernel function is used for calculating a second array, and the calculation of the second array depends on the calculation result of the first array; calling a pre-established target equipment function through the target kernel function, and calculating the first array through the target equipment function to obtain a calculation result of the first array; in the process of running the target kernel function, reading the calculation result of the first array pointed by the target local variable, and calculating the second array by the target kernel function based on the calculation result of the first array. By adopting the method, the calculation time delay can be reduced.

Description

Array computing method, device, equipment, medium and computer program product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a medium, and a computer program product for group computation.

Background

In a GPU computing platform, a situation that computation of one array needs to depend on a computation result of another array often occurs, and in general, the two arrays may be referred to as "arrays with a dependency relationship", and for two arrays with a dependency relationship, if parallel computation is performed on the two arrays, a read-write conflict may occur, that is, the depended arrays are read when computation is not completed yet, and are used to compute the depended arrays, and a computation error may be caused by the read-write conflict.

In the related art, to avoid read-write collision, when an array having a dependency relationship is calculated, two kernel functions may be generally created, and after one kernel function finishes calculating the depended array, the other kernel function calculates the depended array based on the calculation result of the depended array.

However, the approach provided by the related art is long in calculation delay.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an array computing method, apparatus, device, medium, and computer program product capable of reducing computing latency.

In a first aspect, an embodiment of the present application provides a method for computing a set of data, where the method is used in a GPU computing platform, and the method includes:

calling a target kernel function, wherein the target kernel function is defined with a target local variable, the target local variable points to the calculation result of the first array, the target kernel function is used for calculating a second array, and the calculation of the second array depends on the calculation result of the first array; calling a pre-established target equipment function through the target kernel function, and calculating the first array through the target equipment function to obtain a calculation result of the first array; in the process of running the target kernel function, reading the calculation result of the first array pointed by the target local variable, and calculating the second array by the target kernel function based on the calculation result of the first array.

In one embodiment, after the target kernel function calls a pre-created target device function to calculate the first array by the target device function, and obtain a calculation result of the first array, the method further includes:

storing the calculation result of the first array in a thread register corresponding to the target local variable;

correspondingly, reading the calculation result of the first array pointed to by the target local variable includes:

and reading the calculation result of the first array from the thread register corresponding to the target local variable.

Because the read-write speed of the register is high, compared with the mode of reading and writing data from the memory in the prior art, the time consumed by reading and writing the data from the thread register can be reduced, and therefore, the calculation time is favorably shortened.

In one embodiment, the method further comprises:

before calculating the second array, the target kernel function reads the second array from the memory; after computing the second array, the target kernel function stores the computed result of the second array in memory.

In one embodiment, the target kernel function defines a plurality of the target local variables, and each target local variable points to the calculation result of a different element of the first array.

Defining a plurality of the target local variables in the target kernel function can help the technical scheme provided by the application to adapt to different computing scenarios.

In one embodiment, the GPU computing platform includes a host side and a device side, and calls a target kernel function, including:

the host terminal sends a call instruction to the device terminal to call the target kernel function.

In one embodiment, the GPU computing platform is a ROCm computing platform.

In a second aspect, an embodiment of the present application provides a group computing device, for use in a GPU computing platform, the device including:

the first calling module is used for calling a target kernel function, the target kernel function defines a target local variable, the target local variable points to the calculation result of the first array, the target kernel function is used for calculating the second array, and the calculation of the second array depends on the calculation result of the first array;

the second calling module is used for calling a pre-established target equipment function through the target kernel function so as to calculate the first array by the target equipment function and obtain a calculation result of the first array;

and the calculation module is used for reading the calculation result of the first array pointed by the target local variable in the process of running the target kernel function so as to calculate the second array by the target kernel function based on the calculation result of the first array.

In one embodiment, the apparatus further comprises a first storage module;

the first storage module is used for storing the calculation result of the first array in a thread register corresponding to the target local variable;

correspondingly, the calculation module is specifically configured to: and reading the calculation result of the first array from the thread register corresponding to the target local variable.

In one embodiment, the device further comprises a reading module and a second storage module;

the reading module is configured to read the second array from a memory by the target kernel function before the second array is calculated;

the second storage module is configured to, after the calculation of the second array, store the calculation result of the second array in the memory by the target kernel function.

In one embodiment, the GPU computing platform includes a host side and a device side, and the first calling module is specifically configured to: the host terminal sends a call instruction to the device terminal to call the target kernel function.

In one embodiment, the GPU computing platform is a ROCm computing platform.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory in which a computer program is stored and a processor that implements any of the above-described first aspect when executing the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described first aspects.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that when executed by a processor implements any of the aspects described above in the first aspect.

In the array computing method, the array computing device, the GPU computing platform and the computer program product, the GPU computing platform may call a target kernel function, wherein the target kernel function defines a target local variable, the target local variable points to a computing result of a first array, the target kernel function is used for computing a second array, and the computing of the second array depends on the computing result of the first array, then call a pre-created target device function through the target kernel function to compute the first array by the target device function to obtain the computing result of the first array, then, in the process of running the target kernel function, read the computing result of the first array pointed to by the target local variable to compute the second array by the target kernel function based on the computing result of the first array, so that, in the process of computing the first array and the second array having a dependency relationship, only one kernel function and one device function need to be created, and as compared with the kernel function, the device function can be directly started without being called by a host end of the GPU computing platform, the starting time is short, and the memory does not need to be read in the device function execution process, compared with a mode that two kernel functions are needed to calculate an array with a dependency relationship, a calculation mode that one kernel function is replaced by the device function can reduce the execution delay of the function, thereby reducing the calculation delay of the array.

Drawings

FIG. 1 is a flow diagram of a method for array computation in one embodiment;

FIG. 2 is a flow diagram of a method for array computation in one embodiment;

FIG. 3 is a flow diagram of a method for array computation in one embodiment;

FIG. 4 is a flow diagram of a method for array computation in one embodiment;

FIG. 5 is a flow diagram of a method for array computation in one embodiment;

FIG. 6 is a block diagram of an array computing device in one embodiment;

FIG. 7 is a block diagram of an array computing device in one embodiment;

FIG. 8 is a diagram of an internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, the background knowledge related to the technical solutions will be briefly described below.

1. Kernel function and device function

In a GPU computing platform there are: host-the Host side (usually referred to as CPU) and Device-the Device side (usually referred to as GPU).

The kernel function and the device function are both functions used for executing calculation in the GPU calculation platform, wherein the kernel function is a function called by the host side and executed by the device side, and the device function is a function called by the device side and executed by the device side.

In other words, for the kernel function, the execution process includes: and the host terminal sends a calling instruction to the equipment terminal, and the equipment terminal executes the kernel function according to the instruction of the calling instruction after receiving the calling instruction.

For the device function, the execution process comprises the following steps: when a certain function executed by the equipment terminal calls the equipment function, the equipment function is executed.

2. Read-write conflicts in arrays with dependencies

By "dependent arrays" is meant that the computation of one array needs to be dependent on the computation of another array, e.g., the computation of the A array is: b is the calculation result of the B array, which means that the calculation of the a array needs to depend on the calculation result of the B array, that is, the a array and the B array are arrays having a dependency relationship.

In general, two arrays having no dependency relationship may be calculated separately in parallel, for example, if the C array and the D array have no dependency relationship, the C array and the D array may be calculated separately in parallel.

However, for two arrays with dependencies, if they are computed in parallel, the following may occur: the depended array is read when the calculation is not finished, and is used for calculating the depended array, the condition is called read-write conflict, and the calculation error is caused when the read-write conflict occurs.

For example, if the a array and the B array in the above example are calculated in parallel, it may happen that the B array is read to calculate the a array without completing the calculation, which is a read-write conflict.

In practical applications, to avoid read-write collision, when calculating arrays having dependency relationships (hereinafter, referred to as an a array and a B array as examples respectively), two kernel functions may be created, which are kernel1 and kernel2, respectively, where, referring to fig. 1, kernel1 is configured to read the B array from a memory, calculate the B array, and write the calculated B array into the memory, and kernel2 is configured to read the a array and the calculated B array from the memory, calculate the a array using the calculated B array, and write the calculated a array into the memory, that is, serially calculate the B array and the a array by kernel1 and kernel2, so as to avoid read-write collision.

However, on one hand, the prior art needs to perform two reading processes (reading the B array, reading the a array, and calculating the B array) on the memory, and on the other hand, the start of the kernel function needs a call instruction from the host, so that it needs a certain start time, and under the combined action of these two factors, the delay of the array calculation is long.

In view of the above, in order to reduce the time delay of the array computation, embodiments of the present application provide an array computation method, an apparatus, a device, a medium, and a computer program product, in which a GPU computing platform may call a target kernel function, where the target kernel function defines a target local variable, the target local variable points to a computation result of a first array, the target kernel function is used to perform computation on a second array, and the computation of the second array depends on the computation result of the first array, then a pre-created target device function is called by the target kernel function to perform computation on the first array by the target device function to obtain a computation result of the first array, and then, in the process of executing the target kernel function, the computation result of the first array pointed to by the target local variable is read to perform computation on the second array by the target kernel function based on the computation result of the first array, therefore, in the process of calculating the first array and the second array with the dependency relationship, only one kernel function and one device function need to be created, and as compared with the kernel function, the device function can be directly started without being called by a host end of a GPU computing platform, the starting time is short, and the memory does not need to be read in the execution process of the device function, compared with a mode that two kernel functions are needed to calculate the array with the dependency relationship, the execution time delay of the function can be reduced by replacing one kernel function with a calculation mode of the device function, and the calculation time delay of the array is reduced.

As described above, the array Computing method provided in the embodiments of the present application may be applied to a GPU Computing Platform, which may be, for example, a ROCm (english: Platform for GPU-Enabled HPC and ultrasound Computing) Computing Platform, where the ROCm Computing Platform is an open-source, high-performance and very-large-scale cluster GPU general-purpose Computing Platform.

The GPU computing platform comprises a host side and a device side, wherein the host side may be a CPU, for example, and the device side may be a GPU, for example.

Referring to fig. 2, a flowchart of an array computing method provided in an embodiment of the present application is shown, where the array computing method may be applied to a GPU computing platform, and as shown in fig. 2, the array computing method includes the following steps:

step 201, calling a target kernel function.

The target kernel function is used for calculating the second array, and the calculation of the second array depends on the calculation result of the first array.

For example, if the calculation of the a array (i.e., the second array) needs to depend on the calculation result of the B array (i.e., the first array), an objective kernel function for calculating the a array may be created, and an objective local variable B _ local may be defined in the objective kernel function, where the objective local variable B _ local points to the calculation result of the B array.

In an alternative embodiment of the present application, a plurality of target local variables may be defined in the target kernel function, where each target local variable points to a calculation result of a different element of the first array.

For example, assuming that the calculation of the a array is a (i, j, k) ═ B (i, j, k) + B (i-1, j, k), the calculation of one element in the a array needs to rely on the calculation results of two different elements in the B array, in which case two target local variables may be defined in the target kernel function and pointed to the calculation results of two elements of the B array.

As described above, the GPU computing platform may include a host side and a device side, and the way for the GPU computing platform to call the target kernel function may include: and the host terminal sends a calling instruction to the equipment terminal so as to call the target kernel function.

Step 202, the target kernel function calls a pre-created target device function, so that the target device function calculates the first array, and a calculation result of the first array is obtained.

In this embodiment of the present application, a target device function may be created in advance in a device side of the GPU computing platform, where the target device function is used to compute the first array, and in a running process of the target device function, the target device function may be called by the target device function to compute the first array by the target device function, so as to obtain a computation result of the first array.

Step 203, in the process of running the target kernel function, the target kernel function reads the calculation result of the first array pointed by the target local variable, so that the target kernel function calculates the second array based on the calculation result of the first array.

In the array computing method provided by this embodiment, the GPU computing platform may call a target kernel function, where the target kernel function defines a target local variable, the target local variable points to a computation result of the first array, the target kernel function is used to compute the second array, and the computation of the second array depends on the computation result of the first array, then call a pre-created target device function through the target kernel function to compute the first array by the target device function to obtain a computation result of the first array, then, in the process of running the target kernel function, read the computation result of the first array to which the target local variable points, to compute the second array by the target kernel function based on the computation result of the first array, so that, in the process of computing the first array and the second array having a dependency relationship, only one kernel function and one device function need to be created, and as compared with the kernel function, the device function can be directly started without being called by a host end of the GPU computing platform, the starting time is short, and the memory does not need to be read in the device function execution process, compared with a mode that two kernel functions are needed to calculate an array with a dependency relationship, a calculation mode that one kernel function is replaced by the device function can reduce the execution delay of the function, thereby reducing the calculation delay of the array.

In one embodiment, after step 202, the GPU computing platform may also perform the following technical process: correspondingly, referring to fig. 3, step 203 includes:

step 301, in the process of running the target kernel function, reading the calculation result of the first array from the thread register corresponding to the target local variable.

The registers are small storage areas used for storing data inside the GPU and used for temporarily storing the data participating in the operation and the operation results. It is a common sequential logic circuit in nature, but the sequential logic circuit only includes a memory circuit, and although its memory capacity is limited, its read/write speed is very fast, and it can be used to temporarily store instruction, data and address. The thread register is a register specially used for storing data and operation results corresponding to the thread.

Step 302, the target kernel function calculates the second array based on the calculation result of the first array.

Because the read-write speed of the register is higher, compared with the mode of reading and writing data from the memory in the prior art, the time consumed by reading and writing data from the thread register can be reduced, and therefore, the calculation time is favorably reduced.

In addition, in an optional embodiment of the present application, before the GPU computing platform performs the computation on the second array, the target kernel function needs to read the second array from the memory, and after the computation on the second array, the target kernel function may store the computation result of the second array in the memory.

Referring to fig. 4, a flowchart of an array computing method provided in an embodiment of the present application is shown, where the array computing method is applied in a GPU computing platform, and as shown in fig. 4, the array computing method includes the following steps:

step 401, calling a target kernel function.

The target kernel function is defined with a target local variable pointing to a calculation result of the first array, and is used for calculating the second array, and the calculation of the second array depends on the calculation result of the first array. The target kernel function may define a plurality of target local variables, and each target local variable points to a calculation result of a different element of the first array. And, the calling process of the target kernel function includes: and the host terminal sends a calling instruction to the equipment terminal so as to call the target kernel function.

Step 402, the target kernel function calls a pre-created target device function, so that the target device function calculates the first array, and a calculation result of the first array is obtained.

And step 403, storing the calculation result of the first array in a thread register corresponding to the target local variable.

Step 404, the target kernel function reads the second array from the memory.

Step 405, in the process of running the target kernel function, reading the calculation result of the first array from the thread register corresponding to the target local variable.

Step 406, the target kernel function calculates the second array based on the calculation result of the first array.

Step 407, the target kernel function stores the calculation result of the second array in the memory.

For the convenience of the reader to understand the technical solution provided in the embodiment of the present application, please refer to a schematic diagram corresponding to the data reading and writing and the calculation process shown in fig. 5.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides an array calculating apparatus for implementing the array calculating method mentioned above. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the array calculation apparatus provided below can be referred to the limitations of the array calculation method in the foregoing, and details are not described here.

In one embodiment, as shown in FIG. 6, there is provided an array computing apparatus 600 comprising: a first calling module 601, a second calling module 602, and a calculating module 603, wherein:

the first calling module 601 is configured to call a target kernel function, where the target kernel function defines a target local variable, the target local variable points to a calculation result of a first array, and the target kernel function is configured to perform calculation on a second array, where the calculation of the second array depends on the calculation result of the first array.

A second calling module 602, configured to call a pre-created target device function through the target kernel function, so that the target device function performs calculation on the first array, and obtains a calculation result of the first array.

The calculating module 603 is configured to, during the process of running the target kernel function, read a calculation result of the first array pointed to by the target local variable, so that the target kernel function calculates the second array based on the calculation result of the first array.

In an optional embodiment of the application, the target kernel function defines a plurality of the target local variables, each of the target local variables pointing to a result of a computation of a different element of the first array.

In an optional embodiment of the present application, the GPU computing platform includes a host side and a device side, and the first calling module 601 is specifically configured to: the host terminal sends a call instruction to the device terminal to call the target kernel function.

In an optional embodiment of the present application, the GPU computing platform is a ROCm computing platform.

Referring to fig. 7, another array computing apparatus 700 provided in the embodiment of the present application is shown, where the array computing apparatus 700 includes, in addition to the modules included in the array computing apparatus 600, optionally, the following modules: a first storage module 604, a read module 605, and a second storage module 606.

The first storing module 604 is configured to store the calculation result of the first array in the thread register corresponding to the target local variable.

Correspondingly, the calculating module 603 is specifically configured to: and reading the calculation result of the first array from the thread register corresponding to the target local variable.

The read module 605 is configured to read the second array from the memory by the target kernel function before the second array is calculated.

The second storing module 606 is configured to, after the calculation of the second array, store the calculation result of the second array in the memory by the target kernel function.

The modules in the array of computing devices may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server or a terminal device, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. The processor of the computer device may include a CPU and a GPU, where the CPU may serve as a host side of a GPU computing platform and the GPU may serve as a device side of the GPU computing platform. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external computer device through a network connection. The computer program is executed by a processor to implement a method of group computing.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: storing the calculation result of the first array in a thread register corresponding to the target local variable; and reading the calculation result of the first array from the thread register corresponding to the target local variable.

In one embodiment, the processor, when executing the computer program, further performs the steps of: before calculating the second array, the target kernel function reads the second array from the memory; after computing the second array, the target kernel function stores the computed result of the second array in memory.

In one embodiment, the target kernel function defines a plurality of the target local variables, each of which points to a computation result of a different element of the first array.

In one embodiment, the GPU computing platform comprises a host side and a device side, and the processor, when executing the computer program, further implements the following steps: the host terminal sends a call instruction to the device terminal to call the target kernel function.

In one embodiment, the GPU computing platform is a ROCm computing platform.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: storing the calculation result of the first array in a thread register corresponding to the target local variable; and reading the calculation result of the first array from the thread register corresponding to the target local variable.

In one embodiment, the computer program when executed by the processor further performs the steps of: before calculating the second array, the target kernel function reads the second array from the memory; after computing the second array, the target kernel function stores the computed result of the second array in memory.

In one embodiment, the GPU computing platform comprises a host side and a device side, and the computer program when executed by the processor further implements the steps of: the host terminal sends a call instruction to the device terminal to call the target kernel function.

In one embodiment, the GPU computing platform is a ROCm computing platform.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

In one embodiment, the GPU computing platform is a ROCm computing platform.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. An array computing method, for use in a GPU computing platform, the method comprising:

calling a target kernel function, wherein a target local variable is defined in the target kernel function, the target local variable points to a calculation result of a first array, the target kernel function is used for calculating a second array, and the calculation of the second array depends on the calculation result of the first array;

calling a pre-established target equipment function through the target kernel function, and calculating the first array through the target equipment function to obtain a calculation result of the first array;

in the process of running the target kernel function, reading the calculation result of the first array pointed by the target local variable, and calculating the second array by the target kernel function based on the calculation result of the first array.

2. The method according to claim 1, wherein the method further comprises, after calling a pre-created target device function through the target kernel function to calculate the first array by the target device function, and obtaining a calculation result of the first array:

correspondingly, the reading of the calculation result of the first array pointed to by the target local variable includes:

3. The method of claim 1, further comprising:

before calculating the second array, the target kernel function reads the second array from the memory;

after computing the second array, the target kernel function stores the computed result of the second array in memory.

4. The method of claim 1, wherein the target kernel function defines a plurality of the target local variables, each of the target local variables pointing to a computation of a different element of the first array.

5. The method of claim 1, wherein the GPU computing platform comprises a host side and a device side, and wherein the invoking the target kernel comprises:

and the host terminal sends a calling instruction to the equipment terminal so as to call the target kernel function.

6. The method according to any one of claims 1 to 5, wherein the GPU computing platform is a ROCM computing platform.

7. An array computing apparatus, for use in a GPU computing platform, the apparatus comprising:

the system comprises a first calling module, a second calling module and a third calling module, wherein the first calling module is used for calling a target kernel function, the target kernel function is defined with a target local variable, the target local variable points to a calculation result of a first array, the target kernel function is used for calculating a second array, and the calculation of the second array depends on the calculation result of the first array;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.