CN110109861A - A kind of task executing method and device - Google Patents
A kind of task executing method and device Download PDFInfo
- Publication number
- CN110109861A CN110109861A CN201910327737.7A CN201910327737A CN110109861A CN 110109861 A CN110109861 A CN 110109861A CN 201910327737 A CN201910327737 A CN 201910327737A CN 110109861 A CN110109861 A CN 110109861A
- Authority
- CN
- China
- Prior art keywords
- task
- kernel
- artificial intelligence
- arithmetic core
- process equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
Abstract
The invention discloses a kind of task executing method and relevant apparatus.Parallel argument table is obtained by computer equipment, realizes the function of software scheduler, so that the software program run on the Intelligent hardware with different hardware structure can be compatible with.
Description
Technical field
The present invention relates to computer field more particularly to a kind of task executing methods and device.
Background technique
With the development of technology, Intelligent hardware brings great convenience to people's production and life.But with skill
The continuous renewal evolution of art, the hardware configuration of Intelligent hardware are also constantly changing, and the Intelligent hardware of such as a new generation may be
At least one functional module is increased on the basis of the Intelligent hardware of an old generation newly.Due to the hardware configuration of the Intelligent hardware of a new generation
It is different from the hardware configuration of the Intelligent hardware of an old generation, therefore will lead to the software program run on new and old two generations Intelligent hardware not
It can be compatible.Therefore, how to realize that new and old two generations Intelligent hardware can run identical software program as research hotspot.
Summary of the invention
The embodiment of the present invention provides a kind of task executing method and device, so that the Intelligent hardware with different hardware structure
The software program of upper operation can be compatible with.
The function that hardware scheduler is realized by software mode, by dividing for each kernel in artificial intelligence process equipment
With one piece of privately owned memory space, for storing the parallel argument table of each kernel, to realize that multiple kernels can be performed simultaneously together
One instruction realizes the task processing of SIMT, the i.e. programming of SIMT and compiling.
In a first aspect, the embodiment of the present invention provides a kind of task executing method, this method is applied to task execution system, should
Task execution system includes general purpose computing device and artificial intelligence process equipment, this method comprises:
Task type, task scale and the process instruction of determining goal task are requested according to the processing of user;Processing request
Including pending data;
The parallel variable information of goal task, the parallel variable letter of goal task are determined according to task scale and task type
It ceases the kernel being used to indicate in M arithmetic core cluster and executes process instruction;
It, will be parallel if there are the kernels in M arithmetic core cluster to be in idle state in artificial intelligence process equipment
Variable information, process instruction and pending data are transmitted in the memory of artificial intelligence process equipment.
In a feasible embodiment, if there are the kernel in M arithmetic core cluster is equal in artificial intelligence process equipment
It is in idle condition, then parallel variable information is transmitted to the private of each kernel in M arithmetic core cluster being in idle condition
Before having in memory space, this method further include:
Idle state is in the presence or absence of the kernel in M arithmetic core cluster in inquiry artificial intelligence process equipment.
In a feasible embodiment, task scale described in task scale includes the rule of the subtask of at least one dimension
Mould, the parallel variable information include k parallel argument tables, and the k is the rule according to the subtask of at least one shown dimension
What mould determined.
In a feasible embodiment, parallel variable information is transmitted to M arithmetic core cluster being in idle condition
In each kernel privately owned memory space in, comprising:
Parallel variable information is transmitted to each kernel in M arithmetic core cluster being in idle condition by PCIE bus
Privately owned memory space in;
Process instruction and pending data are transmitted in the memory of artificial intelligence process equipment, comprising:
Process instruction and pending data are transmitted in the memory of artificial intelligence process equipment by PCIE bus.
In a feasible embodiment, the operation domain of process instruction includes data writing address, this method further include:
The processing result of artificial intelligence process equipment is received, and the corresponding storage of data writing address is written into processing result
In space.
In a feasible embodiment, parallel variable information includes parallel argument table, which includes task
The quantity and fortune of kernel in mark, task scale, the mark of arithmetic core cluster, the number of arithmetic core cluster, arithmetic core cluster
Calculate at least one of the mark of kernel in core cluster.
Second aspect, the embodiment of the present invention provide a kind of task executing method, and this method is applied to task execution system, should
Task execution system includes general purpose computing device and artificial intelligence process equipment, and artificial intelligence process equipment includes at least one
Arithmetic core cluster, each arithmetic core cluster include at least one kernel, this method comprises:
The mission bit stream that general purpose computing device is sent is received, be engaged in information includes parallel variable information, pending data
And process instruction;Parallel variable information is used to indicate the kernel in M arithmetic core cluster and executes process instruction;M arithmetic core
Cluster is the arithmetic core cluster for needing to occupy when executing process instruction to pending data;
By the parallel argument table of the pending data and preservation kernel into M arithmetic core cluster is corresponding privately owned deposits
It stores up in space;
Parallel variable information in the corresponding privately owned memory space of each kernel in M arithmetic core cluster is stored to kernel
In corresponding on piece storage, and the data determined based on parallel variable information and data writing address are carried out according to process instruction
Processing, to obtain processing result.
In a feasible embodiment, mission bit stream further includes data length, is become according to process instruction to based on parallel
The data that amount information and data writing address determine are handled, to obtain processing result, comprising:
According to process instruction to based on parallel variable information, data writing address and data length determine data at
Reason, to obtain processing result.
In a feasible embodiment, the parallel variable information in the corresponding privately owned memory space of each kernel is read
Into the on piece storage in artificial intelligence process equipment, comprising:
The parallel argument table in the corresponding privately owned memory space of each kernel is read into people by executing preload instruction
In on piece storage in work Intelligent treatment equipment.
In a feasible embodiment, this method further include:
In artificial intelligence process device power or reseting procedure, from the retaining space of the memory of artificial intelligence process equipment
In, it is that the kernel in M arithmetic core cluster distributes privately owned memory space, wherein the kernel and the privately owned memory space pair
It should be arranged.
In a feasible embodiment, mission bit stream further includes address data output, this method further include:
After having executed process instruction to each kernel in M arithmetic core cluster, obtained processing result is transmitted to number
According in the corresponding memory space of output address.
The third aspect, the embodiment of the present invention provide a kind of general purpose computing device, and general purpose computing device is applied to task
Execution system, the task execution system further include artificial intelligence process equipment, which includes:
Determination unit requests the task type, task scale and the processing that determine goal task for the processing according to user
Instruction;Processing request includes pending data;And determine that the parallel variable of goal task is believed according to task scale and task type
Breath, the parallel variable information of goal task are used to indicate the kernel in M arithmetic core cluster and execute process instruction;
Transmission unit, if for there are the kernels in M arithmetic core cluster to be in the free time in artificial intelligence process equipment
Parallel variable information, process instruction and pending data are then transmitted in the memory of artificial intelligence process equipment by state.
In a feasible embodiment, general purpose computing device further include:
Query unit, if for there are the kernels in M arithmetic core cluster to be in sky in artificial intelligence process equipment
The parallel variable information, the pending data and the process instruction are then transmitted at the artificial intelligence by not busy state
Before managing in the memory of equipment, inquires in artificial intelligence process equipment and be in the presence or absence of the kernel in M arithmetic core cluster
Idle state.
In a feasible embodiment, task scale includes the scale of the subtask of at least one dimension, parallel variable
Information includes k parallel argument tables, and k is determined according to the scale of the subtask of at least one shown dimension.
In a feasible embodiment, parallel variable information is being transmitted to M arithmetic core being in idle condition
Aspect in cluster in the privately owned memory space of each kernel, determination unit are specifically used for:
Parallel variable information is transmitted to each kernel in M arithmetic core cluster being in idle condition by PCIE bus
Privately owned memory space in;
Process instruction and pending data are transmitted in the memory of artificial intelligence process equipment, comprising:
Process instruction and pending data are transmitted in the memory of artificial intelligence process equipment by PCIE bus.
In a feasible embodiment, the operation domain of process instruction includes data writing address, general purpose computing device
Further include:
Receiving unit is written for receiving the processing result of artificial intelligence process equipment, and by processing result write-in data
In the corresponding memory space in address.
In a feasible embodiment, parallel variable information includes parallel argument table, which includes task
The quantity and fortune of kernel in mark, task scale, the mark of arithmetic core cluster, the number of arithmetic core cluster, arithmetic core cluster
Calculate at least one of the mark of kernel in core cluster.
Fourth aspect, the embodiment of the present invention provide a kind of artificial intelligence process equipment, the artificial intelligence process equipment application
In task execution system, which further includes general purpose computing device, and artificial intelligence process equipment includes at least one
A arithmetic core cluster, each arithmetic core cluster include at least one kernel, and artificial intelligence process equipment includes:
Receiving unit, for receiving the mission bit stream of general purpose computing device transmission, which includes parallel variable
Table, pending data, data input address and process instruction;Parallel argument table is used to indicate the kernel in M arithmetic core cluster
Execute process instruction;M arithmetic core cluster is the arithmetic core cluster for needing to occupy when executing process instruction to pending data;
Storage unit, it is corresponding for parallel argument table and pending data to be saved into M arithmetic core cluster kernel
In privately owned memory space;
Processing unit, for by the parallel change in the corresponding privately owned memory space of each kernel in M arithmetic core cluster
Scale is read in the corresponding on piece storage of kernel, and according to process instruction to true based on parallel argument table and data writing address
Fixed data are handled, to obtain processing result.
In a feasible embodiment, mission bit stream further includes data length, parallel to being based on according to process instruction
The data that argument table and data writing address determine are handled, and to obtain the aspect of processing result, processing unit is specifically used for:
According to process instruction to based on parallel argument table, data writing address and data length determine data at
Reason, to obtain processing result.
In a feasible embodiment, read by the parallel argument table in the corresponding privately owned memory space of each kernel
To the aspect in the on piece storage in artificial intelligence process equipment, processing unit is specifically used for:
The parallel argument table in the corresponding privately owned memory space of each kernel is read into people by executing preload instruction
In on piece storage in work Intelligent treatment equipment.
In a feasible embodiment, artificial intelligence process equipment further include:
Allocation unit is used in artificial intelligence process device power or reseting procedure, from artificial intelligence process equipment
Be the privately owned memory space of kernel distribution in M arithmetic core cluster in the retaining space of memory, wherein the kernel with it is described
Privately owned memory space is correspondingly arranged.
In a feasible embodiment, mission bit stream further includes address data output, and storage unit is also used to:
After having executed process instruction to each kernel in M arithmetic core cluster, obtained processing result is saved to number
According in the corresponding memory space of output address.
5th aspect, provides a kind of computer-readable medium, which is used for the journey that equipment executes
Sequence code, the program code include for executing the method in first aspect or second aspect.
As can be seen that in the scheme of the embodiment of the present invention, after general purpose computing device receives processing request, according to place
Reason requests to determine task type, task scale and the process instruction of goal task, and is determined according to task scale and task type
The parallel variable information (i.e. parallel argument table) of goal task is appointed so that artificial intelligence process equipment is able to carry out target
Business, so that parallel variable information performance objective task can be based in artificial intelligence process equipment by realizing, wherein parallel variable letter
Breath can be applied to have in the artificial intelligence process equipment of different hardware structure, and then realizes old Intelligent hardware and can hold
Go/operate in execution/operation task/program on new Intelligent hardware.
Further, it by distributing one piece of privately owned memory space for each kernel in artificial intelligence process equipment, is used for
Memory parallel argument table allows the kernel in arithmetic core cluster correctly to read the kernel when executing same instruction at the same time
Data to be treated realize the task processing of SIMT.It, will be in its privately owned memory space before core instructions processing order
Parallel argument table read on piece NRAM, improve subsequent processing efficiency.It is sent by receiving general purpose computing device
Parallel argument table, being equivalent to artificial intelligence process equipment realizes the function of hardware scheduler by general purpose computing device, real
Parallel argument table performance objective task can be based on by having showed in the case where there is no hardware scheduler in artificial intelligence process equipment, into
And the functional module on the Intelligent hardware of a new generation is realized by software programming, so that the software program of new and old two generations Intelligent hardware
It can be compatible with.
The aspects of the invention or other aspects can more straightforwards in the following description.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 a is a kind of structural schematic diagram of task execution system provided in an embodiment of the present invention;
Fig. 1 b is a kind of structural schematic diagram of task execution system provided in an embodiment of the present invention;
Fig. 1 c is a kind of method flow schematic diagram of task execution provided in an embodiment of the present invention;
Fig. 2 is a kind of method flow schematic diagram of task execution provided in an embodiment of the present invention;
Fig. 3 is the method flow schematic diagram of another task execution provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of general purpose computing device provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of artificial intelligence process equipment provided in an embodiment of the present invention.
Specific embodiment
Embodiments herein is described in detail respectively below in conjunction with attached drawing.
A referring to Figure 1, Fig. 1 a are a kind of structural schematic diagram of task execution system provided in an embodiment of the present invention.Such as figure
Shown in 1a, which includes general purpose computing device and artificial intelligence process equipment, wherein general purpose computing device
100 include general processor 101, memory 102 and PCIE interface 103.General processor 101 connects with memory 102 and PCIE
Mouth is separately connected, and memory 102 is connect with PCIE interface.Optionally, which can be server or local
Computer equipment, the general processor 101 can be CPU etc..
Artificial intelligence process equipment 200 include S arithmetic core cluster 201, network-on-chip (Network on chip) 202,
PCIE interface 203 and memory 204, wherein S is the integer greater than 0.Each arithmetic core cluster packet in S arithmetic core cluster
N number of kernel is included, N is the integer greater than 0.Optionally, each kernel can be an artificial intelligent processor.The embodiment of the present application
In, more than one kernel can form an arithmetic core cluster (Cluster).For example, each arithmetic core cluster (Cluster)
It may include 4 kernels, alternatively, each arithmetic core cluster may include 8 kernels.
S arithmetic core cluster stores 205 with PCIE interface 203, memory 204 and on piece respectively by network-on-chip 202
It is connected, data exchange between S arithmetic core cluster and S arithmetic core cluster and PCIE may be implemented in network-on-chip 202
Data exchange between interface 203, memory 204 and on piece storage 205.PCIE interface 203 is connected with memory 204,
PCIE interface 203 can be realized the data communication between artificial intelligence process equipment and general purpose computing device 100.
Optionally, the memory 102 of general purpose computing device and the memory 204 of artificial intelligence process equipment can be double
Times rate (Double Data Rate, DDR) memory.Optionally, on piece storage 205 can cache on piece.Further,
On piece caching is neuron random access memory (neural random access memory, NRAM).
B referring to Figure 1, Fig. 1 b are a kind of structural schematic diagram of task execution system provided in an embodiment of the present invention.Such as figure
Shown in 1b, which includes general purpose computing device and artificial intelligence process equipment, wherein general purpose computing device
100 include general processor 101, memory 102 and PCIE interface 103.General processor 101 connects with memory 102 and PCIE
Mouth is separately connected, and memory 102 is connect with PCIE interface.Optionally, which can be server or local
Computer equipment, the general processor 101 can be CPU etc..
Artificial intelligence process equipment 300 include Q arithmetic core cluster 301, network-on-chip (Network on chip) 302,
PCIE interface 303, memory 304, task dispatcher 305, wherein Q is the integer greater than 0.Each of Q arithmetic core cluster
Arithmetic core cluster includes P kernel, and P is the integer greater than 0.Optionally, each kernel can be an artificial intelligent processor.
In the embodiment of the present application, more than one kernel can form an arithmetic core cluster (Cluster).For example, each operation core
Heart cluster (Cluster) may include 4 kernels, alternatively, each arithmetic core cluster may include 8 kernels.
Q arithmetic core cluster by network-on-chip 302 respectively with task dispatcher 305, PCIE interface 303 and memory
304 are connected, network-on-chip 302 may be implemented data exchange between Q arithmetic core cluster and Q arithmetic core cluster with
Data exchange between PCIE interface 303, memory 304 and task dispatcher 305.PCIE interface 303 is connected with memory 304
It connects, PCIE interface 303 can be realized the data communication between artificial intelligence process equipment and general purpose computing device 100.Task
The goal task of the available general purpose computing device of scheduler 305 transmission and task processing request, and handled and asked according to task
It asks and the goal task is split, obtain the subtask that can be run at least one kernel.Specifically, task dispatcher
305 can split goal task at least one dimension, obtain according to the task type and task scale that user inputs
Obtain the subtask that can be run at least one kernel.Still optionally further, task dispatcher 305 can also appoint each height
The operations such as the transmitting and execution of business are scheduled and manage.
Optionally, the memory 102 of general purpose computing device and the memory 304 of artificial intelligence process equipment can be double
Times rate (Double Data Rate, DDR) memory.Optionally, each kernel may include on piece storage and special deposit
Device, wherein on piece storage can cache on piece.Further, on piece caching is neuron random access memory (neural
Random access memory, NRAM).
It should be pointed out that task dispatcher 305 is an independent hardware in artificial intelligence process equipment, therefore at this
In invention, task dispatcher 305 is properly termed as hardware scheduler again.
Based on above-mentioned task execution system, as illustrated in figure 1 c, this application provides a kind of task executing method, the tasks
Execution method can be suitable for the task execution system of any of the above-described embodiment.Specifically, the above method includes:
S101, general purpose computing device handled according to task request to determine the task type of goal task, task scale and
Process instruction, the task processing request includes pending data.
Specifically, general purpose computing device can handle the task type for requesting to determine goal task, task according to task
The mission bit streams such as scale and process instruction.Task processing request can be user's input.Optionally, which asks
It may include the mission bit streams such as task type and the task scale of goal task, the tasks such as the task type and task scale in asking
Information can be also possible to user's input.
S102, general purpose computing device determine the parallel change of goal task according to the task scale and the task type
Information is measured, the parallel variable information can be suitable for the artificial intelligence process equipment of different editions.
Wherein, artificial intelligence process equipment may include the artificial intelligence process equipment of different editions, the people of each version
The hardware configuration of work Intelligent treatment equipment can be different.
Optionally, the parallel argument table includes task identification, task scale, the mark of arithmetic core cluster, arithmetic core
In the number of cluster, arithmetic core cluster in the quantity of kernel and arithmetic core cluster kernel at least one of mark.Specifically
It is discussed below to determine that the mode of parallel variable information can be found in.
Still optionally further, task scale includes the scale of the subtask of at least one dimension, the parallel variable information
Including k parallel argument tables, the k is determined according to the scale of the subtask of at least one shown dimension.For example, described
Business scale be [taskDimX, taskDimY, taskDimZ], the parallel variable information include taskDimX × taskDimY ×
TaskDimZ parallel argument tables, the taskDimX, taskDimY and taskDimZ are the integer greater than 0.
The parallel variable information is stored privately owned memory space corresponding to the kernel by S103, task execution system
In.
Optionally, the task dispatcher of task execution system can store parallel variable information corresponding privately owned to kernel
In memory space.Optionally, which can be realized using hardware configuration, alternatively, the task dispatcher can also be adopted
Use software realization.Optionally, which can be hardware configuration realization, and the task dispatcher can integrate in artificial
In Intelligent treatment equipment, as shown in Figure 1 b.Optionally, which can also be software realization, as shown in Figure 1a.
Optionally, the corresponding privately owned memory space of the kernel can be the memory 204 in artificial intelligence process equipment
Retaining space, the memory 204 also become the memory in artificial intelligence process equipment.General purpose computing device can be in the storage
Determine that a parallel variable storage section, the parallel variable storage section may include that each kernel is corresponding in retaining space in device
Privately owned memory space.Optionally, the corresponding memory space of the kernel can also be each in artificial intelligence process equipment 200
The storage section for the specified register being arranged in kernel, as shown in Figure 1 b.
In the embodiment of the present application, task execution system can handle the task class that request determines goal task to according to task
The mission bit streams such as type and task scale, and according to the mission bit streams such as the task type and task scale determine the goal task and
Row variable information, the parallel variable information can be suitable for the artificial intelligence process equipment of different hardware structure, the parallel change
Amount information can be stored in artificial intelligence process equipment, so that artificial intelligence process equipment can be according to the parallel change
Measure information processing goal task.In the embodiment of the present application, by the way that the parallel variable information of different editions can be suitable for, so that not
Artificial intelligence process equipment with version can be compatible with identical software programming mode, so as to improve the general of software program
Property, and then improve the efficiency of the task execution system.
Referring to fig. 2, Fig. 2 is a kind of flow diagram of task executing method provided in an embodiment of the present invention.This method is answered
For task execution system shown in Fig. 1 a or 1b, which includes general purpose computing device and artificial intelligence process
Equipment.As shown in Fig. 2, this method comprises:
S201, general purpose computing device handled according to task request to determine the task type of goal task, task scale and
Process instruction.
Wherein, task processing request includes that pending data will after general purpose computing device is connected to task processing request
Pending data is saved into the memory of general purpose computing device.Optionally, task processing request can be user's input
's.Further alternative, task processing request can also include the mission bit streams such as task type and task scale.
The number M1 for the kernel for needing to occupy when task type is used to indicate artificial intelligence process equipment performance objective task,
M1 is the integer greater than 0, and M1 is less than or equal to S*N.Task scale is used to characterize the operand of goal task.The application is implemented
In example, task type may include multicore task type and monokaryon task type, and optionally, multicore task type can use
UNION indicates that monokaryon task type can be indicated using BLOCK.In multicore task, task type is used to indicate artificial intelligence
The number M for the arithmetic core cluster for needing to occupy when processing equipment performance objective task, artificial intelligence process equipment executes mesh at this time
The number M1=M*N for the kernel for needing to occupy when mark task.
For example, illustrating that the task type is multicore task when task type is UNION.If the task class of goal task
Type is UNION1, then the number for needing to occupy arithmetic core cluster when artificial intelligence process equipment performance objective task is 1;If target
The task type of task is UNION2, then needs to occupy of arithmetic core cluster when artificial intelligence process equipment performance objective task
Number is 2;If the task type of goal task is UNION4, need when artificial intelligence process equipment performance objective task to occupy fortune
The number for calculating core cluster is 4;If the task type of goal task is UNION8, artificial intelligence process equipment performance objective task
When need to occupy arithmetic core cluster number be 8.
For another example, when task type is BLOCK, then illustrate that the task type is monokaryon task, that is, execute the goal task
Only need a kernel.
S202, general purpose computing device determine the parallel variable information of goal task according to task scale and task type.
Wherein, parallel variable information may include at least one parallel argument table, and each parallel argument table includes task mark
Know taskid, task scale taskDim, the number for identifying clusterid, arithmetic core cluster of arithmetic core cluster
In clusterDim, arithmetic core cluster in the quantity coreDim and arithmetic core cluster of kernel in the mark coreid of kernel
It is at least one.
Optionally, general purpose computing device can be handled according to task and be requested, and obtain goal task at least one dimension
On subtask scale.Specifically, user, which can specify, splits goal task on X, Y, tri- dimensions of Z, respectively
Obtain the scale taskDimX of goal task subtask on X-dimension, the scale of subtask of the goal task in Y dimension
The scale taskDimZ of the subtask of taskDimY and the goal task on Z-dimension.Optionally, when task type is
When BLOCK, then it is 1 that taskDimX, taskDimY and taskDimZ, which default value,.Optionally, general purpose computing device is also
Parallel total scale can be determined according to the scale of subtask at least one dimension.Specifically, which can be
taskDimX×taskDimY×taskDimZ。
Still optionally further, general purpose computing device can determine minimum simultaneously according to the scale of the subtask on certain dimension
Row granularity.For example, general purpose computing device can be that the transmitting of unit degree of parallelism is executed according to taskDimX, i.e., minimum parallel granularity
For taskDimX=clusterDim × coreDim.Further, after user specifies Union type, if specified
TaskDimX is not the positive integer times of clusterDim × coreDim, then can detect and report an error when running.Certainly, in other implementations
In example, general purpose computing device can also determine minimum parallel granularity according to taskDimY or taskDimZ, specific to determine
Method is consistent with the method for determination of above-mentioned taskDimX.
Still optionally further, general purpose computing device can also determine the task identification taskidX of each subtask,
TaskidY, taskidZ, taskidX, taskidY and taskidZ indicate the task of current inner execution in X, tri- directions Y, Z
On ID.Specifically, general purpose computing device can according to scale taskDimX, taskDimY of subtask in each dimension and
TaskDimZ determines task identification taskidX in respective dimensions, taskidY, taskidZ.For example, the task of goal task is advised
When mould is [4,1,1], then the value that the value range of taskidX can be 0~3, taskidY can be taking for 0, taskidZ
Value can be 0.For another example, when the task scale of goal task is [4,2,2], then the value range of taskidX can be 0~3,
The value of taskidY can be 0 or 1, and the value of taskidZ can be 0 or 1.Further optionally, general purpose computing device
Number of repetition can be further determined that in the case where keeping unit degree of parallelism constant, so that it is determined that going out at least one parallel change
Scale.For example, general purpose computing device can keep the value of taskDimX constant, determined by taskDimY and taskDimZ
Number of repetition (number of repetition is equal to taskDimY × taskDimZ), can convert taskDimY and/or taskDimZ every time
Value.
Parallel argument table illustrated below:
For example, it is assumed that it is UNION1 that processor, which needs to handle task type, and task scale is [4,1,1], parallel variable
Table such as the following table 1.
Variable name | Kernel 0 | Kernel 1 | Kernel 2 | Kernel 3 |
taskidX | 0 | 1 | 2 | 3 |
taskidY | 0 | 0 | 0 | 0 |
TaskidZ | 0 | 0 | 0 | 0 |
taskid | 0 | 1 | 2 | 3 |
taskDimX | 4 | 4 | 4 | 4 |
taskDimY | 1 | 1 | 1 | 1 |
taskDimZ | 1 | 1 | 1 | 1 |
taskDim | 4 | 4 | 4 | 4 |
coreid | 0 | 1 | 2 | 3 |
coreDim | 4 | 4 | 4 | 4 |
clusterid | 0 | 0 | 0 | 0 |
clusterDim | 1 | 1 | 1 | 1 |
Table 1
For another example, it is assumed that it is UNION1 that processor, which needs to handle task type, and task scale is [4,2,2], alignment processing
Be designated as under general assignment [, y=0, z=0] when, parallel argument table such as following table 2.1.
Variable name | Kernel 0 | Kernel 1 | Kernel 2 | Kernel 3 |
taskidX | 0 | 1 | 2 | 3 |
taskidY | 0 | 0 | 0 | 0 |
TaskidZ | 0 | 0 | 0 | 0 |
taskid | 0 | 1 | 2 | 3 |
taskDimX | 4 | 4 | 4 | 4 |
taskDimY | 2 | 2 | 2 | 2 |
taskDimZ | 2 | 2 | 2 | 2 |
taskDim | 16 | 16 | 16 | 16 |
coreid | 0 | 1 | 2 | 3 |
coreDim | 4 | 4 | 4 | 4 |
clusterid | 0 | 0 | 0 | 0 |
clusterDim | 1 | 1 | 1 | 1 |
Table 2.1
Be designated as under alignment processing general assignment [, y=1, z=0] when, parallel argument table such as following table 2.2.
Variable name | Kernel 0 | Kernel 1 | Kernel 2 | Kernel 3 |
taskidX | 0 | 1 | 2 | 3 |
taskidY | 1 | 1 | 1 | 1 |
TaskidZ | 0 | 0 | 0 | 0 |
taskid | 4 | 5 | 6 | 7 |
taskDimX | 4 | 4 | 4 | 4 |
taskDimY | 2 | 2 | 2 | 2 |
taskDimZ | 2 | 2 | 2 | 2 |
taskDim | 16 | 16 | 16 | 16 |
coreid | 0 | 1 | 2 | 3 |
coreDim | 4 | 4 | 4 | 4 |
clusterid | 0 | 0 | 0 | 0 |
clusterDim | 1 | 1 | 1 | 1 |
Table 2.2
Be designated as under alignment processing general assignment [, y=0, z=1] when, parallel argument table such as following table 2.3.
Variable name | Kernel 0 | Kernel 1 | Kernel 2 | Kernel 3 |
taskidX | 0 | 1 | 2 | 3 |
taskidY | 0 | 0 | 0 | 0 |
TaskidZ | 1 | 1 | 1 | 1 |
taskid | 8 | 9 | 10 | 11 |
taskDimX | 4 | 4 | 4 | 4 |
taskDimY | 2 | 2 | 2 | 2 |
taskDimZ | 2 | 2 | 2 | 2 |
taskDim | 16 | 16 | 16 | 16 |
coreid | 0 | 1 | 2 | 3 |
coreDim | 4 | 4 | 4 | 4 |
clusterid | 0 | 0 | 0 | 0 |
clusterDim | 1 | 1 | 1 | 1 |
Table 2.3
Be designated as under alignment processing general assignment [, y=1, z=1] when, parallel argument table such as following table 2.4.
Table 2.4
Parallel variable information, process instruction and pending data are transmitted to artificial intelligence by S203, general purpose computing device
In the memory 204 of processing equipment.
Optionally, artificial intelligence process equipment power on or reseting procedure in, can be by driver in artificial intelligence
It is each kernel distribution one in artificial intelligence process equipment in the reserved area memory address space of the memory 204 of processing equipment
Block memory headroom, the memory headroom are known as the privately owned memory space of kernel.
Wherein, process instruction and instruction to be processed are transmitted at artificial intelligence by general purpose computing device by PCIE bus
In the memory for managing equipment, pending data is handled so that the kernel in M arithmetic core cluster is based on process instruction.
In a feasible embodiment, the operation domain of process instruction includes data writing address, this method further include:
General purpose computing device receives the processing result of artificial intelligence process equipment by PCIE bus, and the processing is tied
Fruit is written in the corresponding memory space of data writing address.Wherein, which can be logical
With part memory space in memory in computer equipment.
As can be seen that in the scheme of the embodiment of the present invention, after general purpose computing device receives task processing request, root
Task type, task scale and the process instruction for requesting to determine goal task are handled according to task, and according to task scale and task
Type determines the parallel variable information (i.e. parallel argument table) of goal task, so that artificial intelligence process equipment is able to carry out
Goal task, realizing can be executed in the case where not having hardware scheduler in artificial intelligence process equipment based on parallel argument table
Goal task, so realize old Intelligent hardware be able to carry out/operate in execution/operation task on new Intelligent hardware/
Program.
Still optionally further, there are in M arithmetic core cluster in artificial intelligence process equipment for general purpose computing device
Kernel is in idle state, then executes described be transmitted to parallel argument table, process instruction and pending data and be in the free time
In M arithmetic core cluster of state in the privately owned memory space of each kernel.
Specifically, general purpose computing device determines the operation that needs occupy when artificial intelligence process equipment performance objective task
After the number M of core cluster, inquiry artificial intelligence process equipment is in idle shape with the presence or absence of the kernel in M arithmetic core cluster
State.If there are the kernels in M arithmetic core cluster to be in idle condition for artificial intelligence process equipment, general purpose computing device is logical
PCIE bus is crossed to save parallel argument table into M arithmetic core cluster in the privately owned memory space of each kernel.General-purpose computations
It is to guarantee to be somebody's turn to do that parallel argument table is saved the purpose into M arithmetic core cluster in the privately owned memory space of each kernel by machine equipment
The kernel of M arithmetic core cluster is not occupied by other tasks.
Wherein, task scale is [taskDimX, taskDimY, taskDimZ], and parallel variable information includes taskDimX
× taskDimY × taskDimZ parallel argument tables, general purpose computing device by PCIE bus by taskDimX ×
TaskDimY × taskDimZ parallel argument tables save the privately owned memory space of each kernel into M arithmetic core cluster respectively
In.
In this respect it is to be noted that kernel, which is in idle condition, refers to that kernel is not carried out instruction.
Referring to Fig. 3, Fig. 3 is the flow diagram of another task processing method provided in an embodiment of the present invention.This method
Applied to task execution system, which includes general purpose computing device and artificial intelligence process equipment, this is artificial
Intelligent treatment equipment includes at least one arithmetic core cluster, and each arithmetic core cluster includes at least one kernel.As shown in figure 3,
This method comprises:
S301, artificial intelligence process equipment receive the mission bit stream that general purpose computing device is sent, which includes
Parallel argument table, pending data, data input address and process instruction;The parallel argument table is used to indicate M arithmetic core
Kernel in cluster executes process instruction;It needs to occupy when the M arithmetic core cluster executes process instruction to pending data
Arithmetic core cluster.
It should be noted that parallel argument table can be found in the associated description of step S202, no longer describe herein.
S302, artificial intelligence process equipment save parallel argument table and pending data into M arithmetic core cluster interior
In the corresponding privately owned memory space of core.
It should be noted that artificial intelligence process equipment power on or reseting procedure in, set in artificial intelligence process
In standby reserved area memory headroom, it is driven to each kernel in artificial intelligence process equipment and distributes one piece of memory headroom, the memory
Space is known as the privately owned memory space of kernel.The reserved area memory headroom of artificial intelligence process equipment is artificial intelligence process equipment
Memory 204 in a part.
S303, artificial intelligence process equipment will be in the corresponding privately owned memory spaces of kernel each in M arithmetic core cluster
Parallel argument table is read in the storage of the on piece in artificial intelligence process equipment, according to process instruction to based on parallel argument table and
The data that data writing address determines are handled, to obtain processing result.
Specifically, each kernel in M arithmetic core cluster is by the parallel argument table in its corresponding privately owned memory space
After reading in the on piece storage of artificial intelligence process equipment, which is determined based on the parallel variable in parallel argument table
The data of processing.The operation domain of process instruction includes data input address, i.e., pending data is in artificial intelligence process equipment
The first address of storage, parallel variable can regard offset address as, and each kernel is determined based on data input address and offset address
Kernel data to be treated are then based on processing order and handle the data, to obtain processing result.
It should be noted that the kernel in M arithmetic core cluster is to execute processing order to pending data simultaneously.
In an example, mission bit stream further includes data length, and each kernel is based on data input address, offset address
(i.e. parallel variable) and data length determine kernel data to be treated.
In an example, the parallel argument table in its corresponding privately owned memory space is read artificial intelligence by each kernel
It can be specifically each kernel by executing preload instruction for its corresponding privately owned memory space in the on piece storage of processing equipment
In parallel argument table read artificial intelligence process equipment on piece storage in.
Aforesaid operations S303 can specifically include: privately owned be deposited by executing preload instruction by each kernel is corresponding
Parallel argument table in storage space is read in the storage of the on piece in the artificial intelligence process equipment.
Wherein, preload instruction be load.nram.dram address1, [%SP], 64;Wherein, [address1] is
The first address of on piece storage, agreement [address1-address2] are the reserved area memory headroom of artificial intelligence process equipment
Address field, %SP are the title of a specified register in artificial intelligence process equipment, are used to indicate herein to be directed toward in each
The first address of the privately owned memory space of core.The parallel argument table of the privately owned memory space of kernel is read by preload instruction
In on piece storage, when kernel executes process instruction, if desired the variable in parallel argument table, then can directly be stored from piece
Middle reading, and then improve the execution efficiency of process instruction.
Optionally, on piece storage can cache on piece.Further, on piece caching is on piece NRAM.
In a feasible embodiment, mission bit stream further includes address data output, this method further include:
After having executed process instruction to each kernel in M arithmetic core cluster, obtained processing result is transmitted to number
According in the corresponding memory space of output address.
Further, processing result is transmitted to general purpose computing device by PCIE bus by artificial intelligence process equipment.
In one example, artificial intelligence process equipment is artificial intelligence process equipment shown in Fig. 1 a, the artificial intelligence
Processing equipment may include 8 clusters, and each cluster may include 4 kernels or 8 kernels, and each kernel can be an artificial intelligence
Energy processor core, i.e. arithmetic core cluster in artificial intelligence process equipment can regard artificial intelligence process equipment shown in Fig. 1 a as
In cluster, the kernel in artificial intelligence process equipment can regard the kernel in artificial intelligence process equipment shown in Fig. 1 a as, scheme
Artificial intelligence process equipment shown in 1a executes the corresponding content of step S301-S303.
As can be seen that in the scheme of the embodiment of the present invention, by dividing for each kernel in artificial intelligence process equipment
With one piece of privately owned memory space, it is used for memory parallel argument table, so that the kernel in arithmetic core cluster executes same finger at the same time
Kernel data to be treated can be correctly read when enabling.It is before kernel executes processing order, its privately owned storage is empty
Between in parallel argument table read on piece NRAM, when kernel executes processing order, if desired parallel variable, then directly from
Parallel variable is read on piece storage, and then improves subsequent processing efficiency.By receive general purpose computing device send and
Row argument table, being equivalent to artificial intelligence process equipment realizes the function of hardware scheduler by general purpose computing device, realizes
It can be based on parallel argument table performance objective task, in turn in the case where there is no hardware scheduler in artificial intelligence process equipment
It realizes old Intelligent hardware and is able to carry out/operates in execution/operation task/program on new Intelligent hardware.
Referring to artificial intelligence process equipment shown in Fig. 1 a and Fig. 1 b, Fig. 1 b relative to artificial intelligence process shown in Fig. 1 a
Equipment has increased task dispatcher and specified register newly, which provides a privately owned memory space for kernel.When
When running task executing method in task execution system shown in Fig. 1 b, artificial intelligence process equipment shown in Fig. 1 b is being received
After processing task, task dispatcher can determine parallel argument table according to task type and task scale, and by parallel argument table
It stores in specified register, then kernel may continue to execute process instruction.
Referring to fig. 4, Fig. 4 provides a kind of structural schematic diagram of general purpose computing device for the embodiment of the present invention.The general meter
It calculates machine equipment and is applied to task execution system, which further includes artificial intelligence process equipment, the general purpose computer
Equipment 400 includes:
Determination unit 401, for handling task type, task scale and the processing of requesting to determine goal task according to task
Instruction;Processing request includes pending data;And determine that the parallel variable of goal task is believed according to task scale and task type
Breath, the parallel variable information of goal task are used to indicate the kernel in M arithmetic core cluster and execute process instruction;
Transmission unit 402, if for there are the kernels in M arithmetic core cluster to be in sky in artificial intelligence process equipment
Parallel variable information, process instruction and pending data are then transmitted in the memory of artificial intelligence process equipment by not busy state.
In a feasible embodiment, the general purpose computing device 400 further include:
Query unit 404, if for there are the kernels in M arithmetic core cluster to be in artificial intelligence process equipment
Parallel variable information, process instruction and pending data are then transmitted in the memory of artificial intelligence process equipment by idle state
It inquires in artificial intelligence process equipment before and is in idle state with the presence or absence of the kernel in M arithmetic core cluster.
In a feasible embodiment, task scale includes the scale of the subtask of at least one dimension, described parallel
Variable information includes k parallel argument tables, and the k is determined according to the scale of the subtask of at least one shown dimension.
In a feasible embodiment, parallel variable information is being transmitted to M arithmetic core being in idle condition
Aspect in cluster in the privately owned memory space of each kernel, transmission unit 402 are specifically used for:
Parallel variable information is transmitted to each kernel in M arithmetic core cluster being in idle condition by PCIE bus
Privately owned memory space in;
In terms of process instruction and pending data are transmitted in the memory of artificial intelligence process equipment, transmission unit
402 are specifically used for:
Process instruction and pending data are transmitted in the memory of artificial intelligence process equipment by PCIE bus.
In a feasible embodiment, the operation domain of process instruction includes data writing address, general purpose computing device
400 further include:
Receiving unit 403 is write for receiving the processing result of artificial intelligence process equipment, and by processing result write-in data
Enter in the corresponding memory space in address.
In a feasible embodiment, parallel variable information includes parallel argument table, which includes task
The quantity and fortune of kernel in mark, task scale, the mark of arithmetic core cluster, the number of arithmetic core cluster, arithmetic core cluster
Calculate at least one of the mark of kernel in core cluster.
It should be noted that above-mentioned each unit (determination unit 401, transmission unit 402, receiving unit 403 and query unit
404) for executing the correlation step of the above method.Wherein it is determined that unit 401 is specifically used for executing the phase of step S201 and S202
Hold inside the Pass, transmission unit 402, receiving unit 403 and query unit 404 are specifically used for executing the related content of step S203.
In the present embodiment, general purpose computing device 400 is presented in the form of unit.Here " unit " can refer to
Application-specific integrated circuit (application-specific integrated circuit, ASIC), executes one or more
The processor and memory of software or firmware program, integrated logic circuit and/or other device of above-mentioned function can be provided.
Referring to Fig. 5, Fig. 5 provides a kind of structural schematic diagram of artificial intelligence process equipment for the embodiment of the present invention.This is artificial
For Intelligent treatment equipment application in task execution system, which further includes general purpose computing device, at artificial intelligence
Managing equipment includes at least one arithmetic core cluster, and each arithmetic core cluster includes at least one kernel, which sets
Standby 500 include:
Receiving unit 501, for receiving the mission bit stream of general purpose computing device transmission, which includes parallel become
Scale, pending data, data input address and process instruction;Parallel argument table is used to indicate interior in M arithmetic core cluster
Core executes process instruction;M arithmetic core cluster is the arithmetic core cluster for needing to occupy when executing process instruction to pending data;
Storage unit 502, it is corresponding for parallel argument table and pending data to be saved into M arithmetic core cluster kernel
Privately owned memory space in;
Processing unit 503, for will be parallel in the corresponding privately owned memory space of each kernel in M arithmetic core cluster
Argument table is read in the storage of the on piece in artificial intelligence process equipment;And according to process instruction to based on parallel argument table sum number
It is handled according to the data that writing address determines, to obtain processing result.
In a feasible embodiment, mission bit stream further includes data length, parallel to being based on according to process instruction
The data that argument table and data writing address determine are handled, and to obtain the aspect of processing result, processing unit 503 is specifically used
In:
According to process instruction to based on parallel argument table, data writing address and data length determine data at
Reason, to obtain processing result.
In a feasible embodiment, read by the parallel argument table in the corresponding privately owned memory space of each kernel
To the aspect in the on piece storage in artificial intelligence process equipment, processing unit 503 is specifically used for:
The parallel argument table in the corresponding privately owned memory space of each kernel is read into people by executing preload instruction
In on piece storage in work Intelligent treatment equipment.
In a feasible embodiment, artificial intelligence process equipment 500 further include:
Allocation unit 504 is used in artificial intelligence process device power or reseting procedure, from artificial intelligence process equipment
Reserved area memory headroom in, be multiple arithmetic core clusters in each kernel distribute a privately owned memory space.
In a feasible embodiment, mission bit stream further includes address data output, and storage unit 502 is also used to:
After having executed process instruction to each kernel in M arithmetic core cluster, obtained processing result is saved to number
According in the corresponding memory space of output address.
It should be noted that (receiving unit 501, storage unit 502, processing unit 503 and distribution are single for above-mentioned each unit
Member is 504) for executing the correlation step of the above method.Wherein, receiving unit 501 is specifically used for the phase of execution step S301 inside the Pass
Hold, storage unit 502 and allocation unit 504 are specifically used for executing the related content of step S302, and processing unit 503 is for executing
The related content of step S303.
In the present embodiment, artificial intelligence process equipment 500 is presented in the form of unit.Here " unit " can be with
Refer to application-specific integrated circuit (application-specific integrated circuit, ASIC), executes one or more
The processor and memory of a software or firmware program, integrated logic circuit and/or other device of above-mentioned function can be provided
Part.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member can take the form of hardware realization.
The embodiment of the present invention has been described in detail above, specific case used herein to the principle of the present invention and
Embodiment is expounded, and the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the present invention
There is change place, to sum up above-mentioned, the contents of this specification are not to be construed as limiting the invention.
Claims (10)
1. a kind of task executing method, which is characterized in that the method is applied to task execution system, the task execution system packet
Include general purpose computing device and artificial intelligence process equipment, which comprises
Task type, task scale and the process instruction for requesting to determine goal task are handled according to task;The task processing is asked
It asks including pending data;
Determine the parallel variable information of goal task according to the task scale and the task type, the goal task and
Row variable information is used to indicate the kernel in M arithmetic core cluster and executes the process instruction;
It, will be described if there are the kernels in M arithmetic core cluster to be in idle state in the artificial intelligence process equipment
Parallel variable information, the pending data and the process instruction are transmitted in the memory of the artificial intelligence process equipment.
2. if the method according to claim 1, wherein there are M in the artificial intelligence process equipment
Kernel in arithmetic core cluster is in idle state, then by the parallel variable information, the pending data and described
Before process instruction is transmitted in the memory of the artificial intelligence process equipment, the method also includes:
It inquires in the artificial intelligence process equipment and is in idle shape with the presence or absence of the kernel in the M arithmetic core cluster
State.
3. method according to claim 1 or 2, which is characterized in that the parallel variable information includes parallel argument table, institute
Stating parallel argument table includes task identification, task scale, the mark of arithmetic core cluster, the number of arithmetic core cluster, arithmetic core
In cluster in the quantity of kernel and arithmetic core cluster kernel at least one of mark.
4. method according to claim 1 or 2, which is characterized in that the operation domain of the process instruction includes data write-in
Address, the method also includes:
The processing result of the artificial intelligence process equipment is received, and the data writing address pair is written into the processing result
In the memory space answered.
5. a kind of task executing method, which is characterized in that the method is applied to task execution system, the task execution system packet
General purpose computing device and artificial intelligence process equipment are included, the artificial intelligence process equipment includes at least one arithmetic core
Cluster, each arithmetic core cluster includes at least one kernel, which comprises
Receive general purpose computing device send mission bit stream, the mission bit stream include parallel variable information, pending data,
Data input address and process instruction;The parallel variable information is used to indicate described in the execution of the kernel in M arithmetic core cluster
Process instruction;
It is corresponding privately owned that the parallel variable information and the pending data are saved into the M arithmetic core cluster kernel
In memory space;
By in the M arithmetic core cluster, parallel variable information storage in the corresponding privately owned memory space of each kernel to institute
It states in the corresponding on piece storage of kernel, and is written according to the process instruction to based on the parallel variable information and the data
The data that address determines are handled, to obtain processing result.
6. according to the method described in claim 5, it is characterized in that, described will be in the corresponding privately owned memory space of each kernel
Parallel variable information is read in the storage of the on piece in artificial intelligence process equipment, comprising:
The parallel argument table in the corresponding privately owned memory space of each kernel is read into artificial intelligence by executing preload instruction
In on piece storage in energy processing equipment.
7. method according to claim 5 or 6, which is characterized in that the method also includes:
In the artificial intelligence process device power or reseting procedure, from the reservation of the memory of the artificial intelligence process equipment
It is that the kernel in the M arithmetic core cluster distributes the privately owned memory space, wherein the kernel and the private in space
There is memory space to be correspondingly arranged.
8. method according to claim 5 or 6, which is characterized in that the mission bit stream further includes address data output, institute
State method further include:
After having executed the process instruction to the kernel in M arithmetic core cluster, obtained processing result is transmitted to the number
According in the corresponding memory space of output address.
9. a kind of general purpose computing device, which is characterized in that the general purpose computing device includes memory and general processor,
It is stored with computer program in the memory, when the general processor executes the computer program, realizes as right is wanted
The step of seeking 1-4 described in any item methods.
10. a kind of artificial intelligence process equipment, which is characterized in that the artificial intelligence process equipment include at least one kernel and
Memory is stored with computer program in the memory, when the kernel executes the computer program, realizes that such as right is wanted
The step of seeking 5-8 described in any item methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327737.7A CN110109861A (en) | 2019-04-22 | 2019-04-22 | A kind of task executing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327737.7A CN110109861A (en) | 2019-04-22 | 2019-04-22 | A kind of task executing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110109861A true CN110109861A (en) | 2019-08-09 |
Family
ID=67486346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910327737.7A Pending CN110109861A (en) | 2019-04-22 | 2019-04-22 | A kind of task executing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110109861A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168203A (en) * | 2020-09-10 | 2022-03-11 | 成都鼎桥通信技术有限公司 | Dual-system running state control method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000006084A2 (en) * | 1998-07-31 | 2000-02-10 | Integrated Systems Design Center, Inc. | Integrated hardware and software task control executive |
CN102622208A (en) * | 2011-01-27 | 2012-08-01 | 中兴通讯股份有限公司 | Multi-core reconfigurable processor cluster and reconfiguration method thereof |
CN104102546A (en) * | 2014-07-23 | 2014-10-15 | 浪潮(北京)电子信息产业有限公司 | Method and system for realizing CPU (central processing unit) and GPU (graphics processing unit) load balance |
CN107341053A (en) * | 2017-06-01 | 2017-11-10 | 深圳大学 | The programmed method of heterogeneous polynuclear programmable system and its memory configurations and computing unit |
-
2019
- 2019-04-22 CN CN201910327737.7A patent/CN110109861A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000006084A2 (en) * | 1998-07-31 | 2000-02-10 | Integrated Systems Design Center, Inc. | Integrated hardware and software task control executive |
CN102622208A (en) * | 2011-01-27 | 2012-08-01 | 中兴通讯股份有限公司 | Multi-core reconfigurable processor cluster and reconfiguration method thereof |
CN104102546A (en) * | 2014-07-23 | 2014-10-15 | 浪潮(北京)电子信息产业有限公司 | Method and system for realizing CPU (central processing unit) and GPU (graphics processing unit) load balance |
CN107341053A (en) * | 2017-06-01 | 2017-11-10 | 深圳大学 | The programmed method of heterogeneous polynuclear programmable system and its memory configurations and computing unit |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168203A (en) * | 2020-09-10 | 2022-03-11 | 成都鼎桥通信技术有限公司 | Dual-system running state control method and device and electronic equipment |
CN114168203B (en) * | 2020-09-10 | 2024-02-13 | 成都鼎桥通信技术有限公司 | Dual-system running state control method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9092266B2 (en) | Scalable scheduling for distributed data processing | |
US9015708B2 (en) | System for improving the performance of high performance computing applications on cloud using integrated load balancing | |
CN103853618B (en) | Resource allocation method with minimized cloud system cost based on expiration date drive | |
US20120215920A1 (en) | Optimized resource management for map/reduce computing | |
CN103617087A (en) | MapReduce optimizing method suitable for iterative computations | |
CN105471985A (en) | Load balance method, cloud platform computing method and cloud platform | |
JP2018073414A (en) | Method of controlling work flow in distributed computation system comprising processor and memory units | |
Li et al. | An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters | |
CN113946431B (en) | Resource scheduling method, system, medium and computing device | |
CN116070682B (en) | SNN model dynamic mapping method and device of neuron computer operating system | |
Wu et al. | Hierarchical task mapping for parallel applications on supercomputers | |
Shojafar et al. | An efficient scheduling method for grid systems based on a hierarchical stochastic Petri net | |
Yun et al. | An integrated approach to workflow mapping and task scheduling for delay minimization in distributed environments | |
CN114327811A (en) | Task scheduling method, device and equipment and readable storage medium | |
Perwej | The ambient scrutinize of scheduling algorithms in big data territory | |
CN105740249B (en) | Processing method and system in parallel scheduling process of big data job | |
CN110109861A (en) | A kind of task executing method and device | |
Guo et al. | Multi-objective optimization for data placement strategy in cloud computing | |
CN103677996B (en) | Collaboration method and system for balancing workload distribution | |
CN108304253A (en) | Map method for scheduling task based on cache perception and data locality | |
Zhang et al. | A distributed computing framework for All-to-All comparison problems | |
Mohanapriya et al. | An optimal time-based resource allocation for biomedical workflow applications in cloud | |
Cernuda et al. | Hflow: A dynamic and elastic multi-layered i/o forwarder | |
CN108228323A (en) | Hadoop method for scheduling task and device based on data locality | |
Ghazali et al. | CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information |