CN109388428B - Layer traversal method, control device and data processing system - Google Patents

Layer traversal method, control device and data processing system Download PDF

Info

Publication number
CN109388428B
CN109388428B CN201710687341.4A CN201710687341A CN109388428B CN 109388428 B CN109388428 B CN 109388428B CN 201710687341 A CN201710687341 A CN 201710687341A CN 109388428 B CN109388428 B CN 109388428B
Authority
CN
China
Prior art keywords
traversal
processing device
layer
processing
traversed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710687341.4A
Other languages
Chinese (zh)
Other versions
CN109388428A (en
Inventor
林恒
项定义
程捷
翟季冬
朱冠宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN201710687341.4A priority Critical patent/CN109388428B/en
Publication of CN109388428A publication Critical patent/CN109388428A/en
Application granted granted Critical
Publication of CN109388428B publication Critical patent/CN109388428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor

Abstract

The application provides a layer traversal method, a control device and a data processing system, wherein the method comprises the following steps: the method comprises the steps of obtaining layer feature parameters of a layer to be traversed, determining a traversal algorithm strategy and a traversal step strategy according to the layer feature parameters, enabling the performance of collaborative traversal by a first processing device and a second processing device to be maximum by adopting the traversal algorithm strategy respectively, enabling the total overhead time of collaborative traversal by the first processing device and the second processing device by adopting the traversal step strategy respectively to be minimum, and informing the first processing device and the second processing device to collaboratively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy. Through the scheme, the overall processing performance of the data processing system can be improved.

Description

Layer traversal method, control device and data processing system
Technical Field
The present application relates to the field of information technologies, and in particular, to a layer traversal method, a control device, and a data processing system.
Background
With the advent of the big data era, data of the modern society is expanding exponentially, and storage, transmission and calculation of big data become important problems to be solved urgently in academia and industry. Graph computation is an important type of computation in big data computation, and graph computation particularly traverses points with a connection relation, in practical applications, for example, data representing an account number in a social network may be abstracted into one data point, in the social network, there are a plurality of data points, there are direct or indirect connections (such as friends, or friends of friends) between the data points, and when accessing these points, it needs to go through layer by layer, i.e. first go through the first point, then go through the other points that have direct connection with the first point, the points that have links with other points that have been traversed are then traversed … …, and the point of each layer and its connection (which may be called an edge) with the point of the next layer may be referred to as a layer, and a plurality of layers form a graph, and the graph is used to represent the association between data.
The Breadth-First Search algorithm (BFS) is one of large data graph calculation core algorithms, and the BFS is a basic component of Single-Source Shortest path (SSSP), graph Diameter calculation (DE), and Betweenness Center (BC) algorithms, which can be extended from the BFS algorithm. BFS has the characteristics of intensive access and storage, random access and storage, data dependence, unstructured and the like. The BFS algorithm includes two traversal algorithms, one is a Top-Down (Top-Down) algorithm and the other is a Bottom-Up (Bottom-Up) algorithm. The Bottom-Up algorithm is an optimization algorithm for the Top-Down algorithm, and for the situation that the number of boundary points is large, the number of times of access of the edge can be reduced by using the Bottom-Up algorithm.
In the prior art, any computing device that can perform computing operations, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Field Programmable Gate Array (FPGA), and an Integrated Circuit (ASIC), is selected when performing graph computation, where a fixed traversal algorithm and a traversal step size are used, where the traversal algorithm is an algorithm for computing each layer of non-access points when performing graph traversal, and the traversal step size is a size of data processed each time when performing graph traversal. For example, for the traversal of the graph data with a small number of boundary points, the CPU may be used to perform the calculation operation, and further, in order to exert the performance advantage of the CPU, the Top-Down algorithm is usually adopted and the calculation operation is performed by setting a fixed step size for the CPU. For the traversal of graph data with a large number of boundary points, the cost can be reduced by using the GPU to perform the calculation operation, and further, in order to exert the performance advantage of the GPU, the Bottom-Up algorithm is generally adopted and the calculation operation is performed by setting a fixed step size for the GPU.
Due to the difference in computing power of processing devices such as CPUs, GPUs, FPGAs, ASICs, and the like, and the difference in traversal algorithms and traversal step lengths that are usually called when performing computing operations, the computing devices cooperate with each other to implement graph computation, and a combination in which different devices, such as a CPU and a GPU, cooperate together to complete graph computation is called a heterogeneous device.
With the advent of large data, processing overhead of data points to be processed in graph calculation also increases, and in order to reduce the processing overhead, a plurality of computing devices are integrated to perform calculation, and the integrated system is called a data processing system, for example, two heterogeneous devices, namely a GPU and a CPU, are integrated to form a data processing system. In performing the graph traversal calculation, the GPU and the CPU operate alternately to reduce the operating overhead of the processing device, for example, the CPU is used to perform the calculation operation with a fixed algorithm and a fixed step size before the number of layers is set, and the GPU is used to perform the calculation operation with a fixed algorithm and a fixed step size after the number of layers is set. However, in this case, the GPU and the CPU are both operated separately, resulting in a low overall resource utilization of the data processing system.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present application provide a layer traversal method, a control device, and a data processing system, so as to improve the overall processing performance of the data processing system.
In a first aspect, the present application provides a layer traversal method, which is used for big data graph calculation, and includes: the method comprises the steps of obtaining layer feature parameters of a layer to be traversed, determining a traversal algorithm strategy and a traversal step strategy according to the layer feature parameters, wherein the traversal algorithm strategy comprises traversal algorithms respectively used when a first processing device and a second processing device cooperatively traverse the layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the performance of the first processing device and the performance of the second processing device respectively adopting the traversal algorithm strategy to perform cooperative traversal are the maximum, the total overhead time of the first processing device and the second processing device respectively adopting the traversal step strategy to perform cooperative traversal is the minimum, and informing the first processing device and the second processing device to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the step traversal strategy.
According to the method and the device, the multiple processing devices can cooperatively traverse the same layer of the graph in the data processing system aiming at the graph to be traversed, and before cooperative traversal, the algorithm and the step length of each processing device can be dynamically adjusted according to the layer characteristic parameters representing the environment of the layer of the graph to be traversed, so that the multiple processing devices can simultaneously and efficiently execute graph traversal operation, and the minimum total overhead time of cooperative traversal can be ensured under the condition of controlling the optimal overall processing performance of the multiple processing devices.
With reference to the first aspect, in a first possible implementation manner of the first aspect, before obtaining the layer feature parameter of the layer to be traversed, the method further includes: the method comprises the following steps of performing collaborative traversal training on a first processing device and a second processing device by using a training graph, acquiring a first corresponding relation between a layer characteristic parameter and a first processing performance, acquiring a second corresponding relation between the layer characteristic parameter and a second processing performance, acquiring a third corresponding relation between the layer characteristic parameter and a third processing performance, acquiring a fourth corresponding relation between the layer characteristic parameter and a fourth processing performance, and determining a traversal algorithm strategy according to the layer characteristic parameter, wherein the first processing performance is the processing performance of performing collaborative traversal on the training graph by the first processing device running a first traversal algorithm, the second processing performance is the processing performance of performing collaborative traversal on the training graph by the first processing device running a second traversal algorithm, the third processing performance is the processing performance of the second processing device running the first traversal algorithm to perform the collaborative traversal on the training image, and the fourth processing performance is the processing performance of the second processing device running the second traversal algorithm to perform the collaborative traversal on the training image.
The corresponding relation between the layer characteristic parameters and the processing performance is determined by hardware configuration of the processing device, training is carried out by inputting a training graph, the corresponding relation between the layer characteristic parameters and each processing performance is tested, when the layer to be traversed is traversed, the layer characteristic parameters of the layer to be traversed are obtained by implementing, and the processing performance can be obtained according to the corresponding relation between the layer characteristic parameters and the training results.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, determining a traversal algorithm strategy according to the layer characteristic parameters, the first corresponding relation, the second corresponding relation, the third processing relation and the fourth corresponding relation, and specifically comprising the following steps: acquiring a first processing performance according to the layer characteristic parameters and the first corresponding relation, obtaining a second processing performance according to the layer characteristic parameter and the second corresponding relation, obtaining a third processing performance according to the layer characteristic parameter and the third corresponding relation, acquiring a fourth processing performance according to the layer characteristic parameters and the fourth corresponding relation, acquiring cooperative processing performance under different traversal algorithm strategies according to the first processing performance, the second processing performance, the third processing performance and the fourth processing performance, and the traversal algorithm strategy adopted by the cooperative processing performance with the maximum value is taken as the traversal algorithm strategy.
By selecting the cooperative processing performance with the maximum value, the optimal overall processing performance of the data processing system for cooperatively traversing the layer can be ensured.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, obtaining the cooperative processing performance under different traversal algorithm policies according to the first processing performance, the second processing performance, the third processing performance, and the fourth processing performance, and selecting a traversal algorithm policy adopted by the cooperative processing performance with the maximum value specifically includes: acquiring cooperative processing performances Y1, Y2, Y3 and Y4 under different traversal algorithm strategies according to the following equations respectively:
Y1=(1-a)*P11+(1-b)*P21,
Y2=(1-a)*P11+(1-b)*P22
Y3=(1-a)*P12+(1-b)*P21,
Y4=(1-a)*P12+(1-b)*P22,
wherein P11 is a first processing performance, P12 is a second processing performance, P21 is a third processing performance, P22 is a fourth processing performance, a is a percentage of performance degradation when the first processing device and the second processing device cooperate with each other, b is a percentage of performance degradation when the second processing device and the first processing device cooperate with each other,
the cooperative processing performance having the maximum value is selected among Y1, Y2, Y3, and Y4, and the traversal algorithm strategy employed for the cooperative processing performance having the maximum value is selected.
With reference to the second or third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the determining a traversal step size policy according to the layer feature parameter specifically includes: and acquiring total overhead time under the strategy of executing different traversal step lengths on the layer to be traversed according to the traversal algorithm strategy, and taking the traversal step length adopted by the first processing device and the traversal step length adopted by the second processing device as the traversal step length strategy when the total overhead time is the minimum value.
By selecting the total overhead time with the minimum value, the total overhead time for cooperatively traversing the layers to be traversed by the data processing system can be ensured to be shortest.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the total overhead time for executing different traversal step strategies on the layer to be traversed is obtained according to a traversal algorithm strategy, and the traversal step adopted by the first processing device and the traversal step adopted by the second processing device when the total overhead time is a minimum value are used as a traversal step strategy, which specifically includes obtaining the total overhead time T according to the following equation:
T=Ncpu*(Scpu/Pcpu+Tcpu)+Ngpu*(Sgpu/Pgpu+Tgpu)+Toverlap,
where, Ncpu is the number of traversal steps of the first processing device for the layer to be traversed, Scpu is the traversal step size of the first processing device for the layer to be traversed, Ngpu is the number of traversal steps of the second processing device for the layer to be traversed, Sgpu is the traversal step size of the second processing device for the layer to be traversed, Tcpu is the preparation time of the first processing device, Tgpu is the preparation time of the second processing device, Pcpu is the processing performance of the first processing device under the traversal algorithm policy, Pgpu is the processing performance of the second processing device under the traversal algorithm policy, Toverlap is the overlapping time when the first processing device and the second processing device work together,
and the control device takes the traversal step length Scpu adopted by the first processing device and the traversal step length Sgpu adopted by the second processing device when the T is the minimum value as traversal step length strategies.
In a possible implementation manner, the layer feature parameter includes any one of the following parameters or a combination thereof: the number of edges of the layer to be traversed, the number of points of the layer to be traversed, the number of large points of the layer to be traversed, the number of edges of the traversed layer adjacent to the layer to be traversed, the number of points of the traversed layer adjacent to the layer to be traversed, and the number of large points of the traversed layer adjacent to the layer to be traversed.
In one possible implementation, the first traversal algorithm is one of a Top-Down algorithm and a Bottom-Up algorithm, and the second traversal algorithm is the other of the Top-Down algorithm and the Bottom-Up algorithm.
In a second aspect, the present application provides a control apparatus comprising: a layer characteristic parameter obtaining module for obtaining the layer characteristic parameters of the layer to be traversed, a strategy confirming module, the system is used for determining a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters, wherein the traversal algorithm strategy comprises traversal algorithms respectively used when a first processing device and a second processing device cooperatively traverse a layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the first processing device and the second processing device respectively adopt the traversal algorithm strategy to carry out cooperative traversal with the highest performance, the first processing device and the second processing device respectively adopt the traversal step strategy to carry out cooperative traversal with the lowest total overhead time, and a notification module, and the first processing device and the second processing device are used for informing the first processing device and the second processing device to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
Any implementation manner of the second aspect or the second aspect is an apparatus implementation manner corresponding to any implementation manner of the first aspect or the first aspect, and the description in any implementation manner of the first aspect or the first aspect is applicable to any implementation manner of the second aspect or the second aspect, and is not described herein again.
In a third aspect, the present application provides a control device for big data map calculation of a data processing system, where the control device is configured to control a first processing device and a second processing device to perform map calculation in cooperation, and the control device includes a processor and a memory; the memory is adapted to store instructions and the processor is adapted to implement the method according to the first aspect described above or any possible implementation manner of the first aspect, according to executing program instructions stored by the memory.
In a fourth aspect, the present application provides a data processing system, including a first processing device, a second processing device, and a control device, where the control device is configured to control the first processing device and the second processing device to perform graph computation in cooperation, where the first processing device and the second processing device perform graph computation according to an instruction of the control device, respectively, and the control device is configured to execute the method described in the first aspect or any possible implementation manner of the first aspect.
In a fifth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect or any possible implementation manner of the first aspect.
In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or any possible implementation form of the first aspect.
Drawings
FIG. 1 is a schematic diagram of a structure of a graph to be processed according to an embodiment of the present application;
FIG. 2 is another schematic diagram of a pending graph according to an embodiment of the present application;
FIG. 3a is a block diagram of an apparatus of a data processing system according to an embodiment of the present application;
FIG. 3b is a block diagram of another embodiment of a data processing system;
FIG. 3c is a block diagram of another embodiment of a data processing system;
FIG. 4 is a flowchart of an layer traversal method according to an embodiment of the present application;
FIG. 5 is a sub-flowchart of an layer traversal method according to an embodiment of the present application;
FIG. 6 is another schematic diagram of a pending graph according to an embodiment of the present application;
FIG. 7 is another schematic diagram of a pending graph according to an embodiment of the present application;
FIG. 8 is another schematic diagram of a pending graph according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an apparatus configuration of a control apparatus according to an embodiment of the present application;
FIG. 10 is a block diagram of another embodiment of a data processing system;
fig. 11 is a schematic device structure diagram of a data processing chip according to an embodiment of the present application.
Detailed Description
Hereinafter, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.
Heterogeneous device: different types of processing devices, such as CPUs, GPUs, FPGAs, and ASICs, may be referred to as heterogeneous devices. In other embodiments of the present application, processing devices of the same type may also be referred to as heterogeneous devices when having different specifications, for example, CPUs having different main frequencies may also be referred to as heterogeneous devices.
A data processing system: the data processing system may be, for example, a chip integrating at least two heterogeneous devices, or a heterogeneous integration platform integrating at least two heterogeneous devices.
The following drawings: referring to fig. 1, fig. 1 is a schematic structural diagram of a to-be-processed graph according to an embodiment of the present disclosure, where fig. 1 is a mathematical model abstracted from a road network or a social network and the like representing a plurality of individuals having direct or indirect relationships, and is used to describe specific relationships among certain things, the graph includes points and edges, for example, in the social network, a point may represent an account and related information, and an edge represents a connection between an account and another account, such as a mutual friend relationship.
Layer drawing: in the process of traversing the graph by the BFS, all the points are divided into a plurality of layers according to the connection relationship of the points, for example, the starting point is the layer 1 of the graph, the point having a direct connection relationship with the starting point forms the layer 2, and the point having a direct connection relationship with the point of the layer 2 forms the layer 3.
Point: as shown in fig. 1, the layer 1 includes points {0}, the layer 2 includes points {1}, the layer 3 includes points {2} {3} {4} {5} {6}, the layer 4 includes points {7} {8} {9}, the layer 5 includes points {10} {11}, and the layer 6 includes points {12}, each point has the same data structure, and the amount of data stored in each point may be the same or different.
Counting: the total number of dots included in a certain layer is referred to, and as shown in fig. 1, the number of dots in layer 3 is 5.
It should be noted that, in the embodiment of the present application, for convenience of illustration, fig. 1 is a simpler diagram, and in practical applications, the data amount of the diagram is far higher than that of fig. 1, for example, a diagram in practical applications may have hundreds of millions of points, and is divided into millions of layers.
Big points: if the threshold of the edge number is set to 4 for a point with a larger edge number in one layer, i.e. the edge number is greater than or equal to 4, as shown in fig. 2, the large point of the third layer is the point {3} {4} {5}, because the edge numbers of the point {3} {4} {5} are 4, 5, respectively, and all reach the threshold of 4.
Large point number: the total number of macro dots included in a certain layer is referred to, and as shown in fig. 2, the number of macro dots in the 3 rd layer is 3.
Side: the connecting line between the dots.
Side length: representing the distance of the connecting line between two points.
The total number of edges: the total number of edges included in a certain layer is 6, and specifically includes an edge where the point {1} is connected to the point {2}, an edge where the point {1} is connected to the point {3}, an edge where the point {1} is connected to the point {4}, an edge where the point {1} is connected to the point {5}, an edge where the point {1} is connected to the point {6}, and an edge where the point {4} is connected to the point {5}, as shown in fig. 1.
Top-Down algorithm: the method is a Top-Down algorithm, the default storage format of big data is the most common point-neighbor data structure, the Top-Down algorithm starts from one point, and adds neighboring points to an accessed point through the adjacent edge of the accessed point, and the Top-Down algorithm needs to access all edges to traverse all points once.
Referring to the diagram shown in fig. 1, starting from the point {0} of the first layer, the point {1} of the second layer is accessed through the edge connecting the point {0} and the point {1} to complete the traversal of the second layer; then, the third layer traversal is completed through an edge access point {2} connected with a point {2} by a point {1}, an edge access point {3} connected with a point {3} by a point {1}, an edge access point {4} connected with a point {4} by a point {1}, an edge access point {5} connected with a point {5} by a point {1} and an edge access point {6} connected with a point {6} by a point {1 }; then, the traversal of the fourth layer is completed through an edge access point {7} connected with a point {7} through a point {2}, an edge access point {8} connected with a point {8} through a point {2}, an edge access point {7} connected with a point {7} through a point {3}, an edge access point {8} connected with a point {8} through a point {4}, an edge access point {9} connected with a point {9} through a point {4}, an edge access point {5} connected with a point {5} through a point {4}, an edge access point {7} connected with a point {7} through a point {6} and an edge access point {8} connected with a point {8} through a point {6 }; then, the traversal of the fifth layer is completed through an edge access point {10} connected with a point {10} through a point {7}, an edge access point {10} connected with a point {10} through a point {8}, an edge access point {11} connected with a point {11} through a point {8}, and an edge access point {11} connected with a point {11} through a point {9 }; and then the traversing of the sixth layer is completed through the edge access point {12} connected with the point {12} through the point {10} and the edge access point {12} connected with the point {12} through the point {11 }.
In summary, the Top-Down algorithm needs to access 26 edges to complete the traversal of the points of the whole graph.
The Bottom-Up algorithm: the bottom-up algorithm marks a point after the point is visited, and then the point is not repeatedly visited, so that the visiting times of the edges are reduced, and all the points can be traversed once only by visiting a part of adjacent edges.
Referring to fig. 2 in detail, fig. 2 is another schematic structural diagram of a graph to be processed according to an embodiment of the present application, and as shown in fig. 2, the Bottom-Up algorithm starts from a point {0}, starts from the point {0} of the first layer, first accesses the point {1} of the second layer through an edge connecting the point {0} and the point {1}, and completes the traversal of the second layer; then, the third layer traversal is completed through an edge access point {2} connected with a point {2} by a point {1}, an edge access point {3} connected with a point {3} by a point {1}, an edge access point {4} connected with a point {4} by a point {1}, an edge access point {5} connected with a point {5} by a point {1} and an edge access point {6} connected with a point {6} by a point {1 }; then, the traversal of the fourth layer is completed through an edge access point {7} connected with the point {7} by the point {2}, an edge access point {8} connected with the point {8} by the point {2} and an edge access point {9} connected with the point {9} by the point {3 }; then, the traversal of the fifth layer is completed through the edge access point {10} connected with the point {10} through the point {7} and the edge access point {11} connected with the point {11} through the point {8 }; and then accessing the point {10} through the edge connecting the point {10} and the point {12} to complete the traversal of the sixth layer.
In summary, the Bottom-Up algorithm accesses 13 edges to complete the traversal of the points of the entire graph.
Step length traversal: the traversal step size refers to the data size that can be processed by a heterogeneous device, such as a CPU, when performing a data processing operation. Depending on the data amount size of each point, the CPU can process the data amount of one point in one traversal step, or can process the data amount of a plurality of points. Depending on the data processing capability of the heterogeneous device itself, the traversal step size may be selected as needed without exceeding the data processing capability supported by the hardware configuration of the heterogeneous device, and optionally, the traversal step size may be expressed in bytes (byte).
Traversing steps: the traversal step number refers to the number of data processing operations performed by a heterogeneous device, such as a CPU, during traversal of a data point at a certain layer in a graph.
The processing performance is as follows: the edges that the heterogeneous device can traverse per second, for example, an edge can be represented by 8 bytes, and the processing performance is 8 bytes × n/S, where n is the number of edges.
In order to improve the overall processing performance of a plurality of integrated heterogeneous devices, embodiments of the present application provide a layer traversal method and apparatus, which can implement a plurality of data processing systems of a heterogeneous integrated platform to perform a graph traversal operation simultaneously. When a plurality of data processing systems of the heterogeneous integrated platform collaboratively execute traversal operation on a certain layer of a graph to be processed, a traversal algorithm and a traversal step length can be adaptively selected for each data processing system, so that the overall processing performance of the plurality of data processing systems is optimal, and the total overhead time of collaborative traversal is minimum.
Referring to fig. 3a, fig. 3a is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure, and as shown in fig. 3a, the data processing system 10 includes a first processing device 101, a second processing device 102, a control device 103, and a shared memory unit 104.
The data processing system 10 is a multiprocessor structure, and may include any combination of one or more processing devices such as CPU \ GPU \ FPGA, etc., where the first processing device 101 and the second processing device 102 are selected for collaborative processing graph computation, the first processing device 101 and the second processing device 102 are heterogeneous devices, i.e., different types of processing devices, and the control device may be implemented by any one of the processing devices in the data processing system, specifically, by an internal unit of the first processing device 101 or the second processing device 102, or by a third processing device (i.e., the control device 103 shown in fig. 3 a) independent of the first processing device 101 and the second processing device 102.
For example, referring to fig. 3b, fig. 3b is another schematic structural diagram of a data processing system according to an embodiment of the present application, and in the data processing system 10 shown in fig. 3b, the control device may be implemented by the control unit 103' inside the first processing device 101.
Referring to fig. 3c, fig. 3c is another schematic structural diagram of a data processing system according to an embodiment of the present application, and in the data processing system 10 shown in fig. 3c, the control device is implemented by the control unit 103 ″ inside the second processing device 102.
In fig. 3a, the first processing device 101, the second processing device 102, and the control device 103 are respectively connected to a shared storage unit 104, a training chart and a to-be-processed chart (described below) are stored in the shared storage unit 104, and the training chart and the to-be-processed chart stored in the shared storage unit 104 are accessible to the first processing device 101, the second processing device 102, and the control device 103. The control device 103 is connected to the first processing device 101 and the second processing device 102, respectively, and can communicate with the first processing device 101 and the second processing device 102, respectively.
Specifically, the first processing device 101 may be one of a CPU, a GPU, an FPGA, and an ASIC, the second processing device 102 may be another one of the CPU, the GPU, the FPGA, and the ASIC, or the first processing device 101 and the second processing device 102 may be the same type of CPU, GPU, FPGA, or ASIC but with different specifications, for example, the first processing device 101 and the second processing device 102 may be two CPUs with different main frequencies.
It should be noted that, for convenience of illustration, only two heterogeneous devices are shown in the data processing system in the present embodiment, but in other embodiments of the present application, the number of heterogeneous devices in the data processing system may be arbitrarily set according to actual needs.
Referring to fig. 4, fig. 4 is a flowchart of an image layer traversing method according to an embodiment of the present application, and as shown in fig. 4, the image layer traversing method includes the following steps:
step S101: the control device 103 uses the training diagram to perform a collaborative traversal training on the first processing device 101 and the second processing device 102.
In this step, the training graph includes, but is not limited to, a random graph conforming to the Power-Law characteristic, such as a random graph generated according to the Kronecker algorithm, or a random graph or Faceboot Trace graph generated according to the RMAT algorithm.
For example, a random graph generated by the Kronecker algorithm can be used as the training graph. When a random graph is generated by using the Kronecker algorithm, M sets of input graphs (for example, M is 10) can be generated by adjusting input parameters (a is 0.57, B is 0.19, C is 0.19, and D is 0.05) of the Kronecker algorithm and controlling a random distribution rule of points in the generated random graph00). The scale of the generated random graph can be represented by a parameter scale, the value of the scale can be 23, 24, 25 or 26, and the total point number of the generated random graph is determined according to the parameter scale, namely, the total point number is 2scaleAnd (4) respectively.
For each training graph, there are many parameters describing the graph, and there are many main parameters affecting the graph traversal efficiency, in the embodiment of the present application, a matrix-based parameter analysis method may be adopted, the main parameters are ranked according to the correlation analysis of the parameters, and important parameters are screened out as representative model parameters, so as to reduce the overhead of the training data set.
In a possible implementation manner, the layer feature parameters include, but are not limited to: the method comprises the following steps of determining the edge number E _ r of a layer to be traversed, the point number V _ c of the layer to be traversed, the large point number Vh _ r of the layer to be traversed, the edge number E _ c of a traversed layer adjacent to the layer to be traversed, the point number V _ r of a traversed layer adjacent to the layer to be traversed and the large point number Vh _ c of a traversed layer adjacent to the layer to be traversed.
In one possible implementation, after the model parameters are preliminarily determined, the model parameters are corrected according to a training result of a traversal algorithm model and a traversal step model which are combined for training, so that the screening precision of the model parameters is improved.
Specifically, in this step, the control device 103 performs a collaborative traversal training on the first processing device 101 and the second processing device 102 by using the training graph, so as to obtain a first corresponding relationship between the layer feature parameter and the first processing performance, obtain a second corresponding relationship between the layer feature parameter and the second processing performance, obtain a third corresponding relationship between the layer feature parameter and the third processing performance, and obtain a fourth corresponding relationship between the layer feature parameter and the fourth processing performance.
The first processing performance is the processing performance of the first processing device 101 running the first traversal algorithm to perform the collaborative traversal on the training image, the second processing performance is the processing performance of the first processing device 101 running the second traversal algorithm to perform the collaborative traversal on the training image, the third processing performance is the processing performance of the second processing device 102 running the first traversal algorithm to perform the collaborative traversal on the training image, and the fourth processing performance is the processing performance of the second processing device 102 running the second traversal algorithm to perform the collaborative traversal on the training image.
For convenience of understanding, the Top-Down algorithm is used as the first traversal algorithm, the Bottom-Up algorithm is used as the second traversal algorithm, the CPU is used as the first processing device 101, and the GPU is used as the second processing device 102.
For example:
v _ c, E _ c, V _ r, E _ r, Vh _ c, and Vh _ r are layer feature parameters, and E1, E2, E3, E4, E5, E6, g1, g2, g3, g4, g5, g6, h1, h2, h3, h4, h5, h6, k1, k2, k3, k4, k5, and k6 are constants obtained by training, which can be referred to in detail in conjunction with tables 1 to 4 below.
TABLE 1
e1 e2 e3 e4 e5 e6
P11 -2.095 67713.080 -0.021 1.193 -0.133 7.322
As shown in table 1, the first processing performance P11 ═ f1(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) ═ E1 ═ V _ c + E2 ═ E _ c + E3 ═ V _ r + E4 ═ E _ r + E5 ═ Vh _ c + E6 ═ Vh _ r ═ 2.095 ═ V _ c +67713.080 ═ E _ c + -0.021 ═ V _ r +1.193 ═ E _ r + -0.133 ═ Vh _ c +7.322 ═ Vh _ r.
TABLE 2
g1 g2 g3 g4 g5 g6
P12 0.428 -13804.275 -0.052 3.076 -0.030 9.365
As shown in table 2, the second processing performance P12 ═ f2(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) ═ g1 ═ V _ c + g2 × E _ c + g3 × V _ r + g4 × E _ r + g5 × Vh _ c + g6 ═ Vh _ r ═ 0.428 ═ V _ c + -13804.275 × -E _ c + -0.052 ═ V _ r +3.076 × -E _ r + -0.030 × Vh _ c +9.365 ═ Vh _ r.
TABLE 3
h1 h2 h3 h4 h5 h6
P21 -10.953 353572.666 0.065 -4.127 -0.515 26.679
As shown in table 3, the third processing performance P21 ═ f3(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) ═ h1 ═ V _ c + h2 × E _ c + h3 × V _ r + h4 × E _ r + h5 × Vh _ c + h6 ═ Vh _ r ═ 10.953 ═ V _ c +353572.666 × E _ c +0.065 × V _ r + -4.127 × E _ r + -0.515 × Vh _ c +26.679 × Vh _ r.
TABLE 4
k1 k2 k3 k4 k5 k6
P22 -57.269 1849996.721 -0.245 13.453 -3.297 186.663
As shown in table 4, the fourth processing performance P22 ═ f4(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) ═ k1 ═ V _ c + k2 · E _ c + k3 · V _ r + k4 · E _ r + k5 · Vh _ c + k6 ═ Vh _ r ═ 57.269 ═ V _ c +1849996.721 · E _ c + -0.245 · V _ r +13.453 · E _ r + -3.297 · — Vh _ c +186.663 · _ r.
Wherein f1(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) represents the first corresponding relationship, f2(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) represents the second corresponding relationship, f3(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) represents the third corresponding relationship, and f4(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r) represents the fourth corresponding relationship.
And P11 is the processing performance of the CPU running the Top-Down algorithm to perform the collaborative traversal on the training image, P12 is the processing performance of the CPU running the Bottom-Up algorithm to perform the collaborative traversal on the training image, P21 is the processing performance of the GPU running the Top-Down algorithm to perform the collaborative traversal on the training image, and P22 is the processing performance of the GPU running the Bottom-Up algorithm to perform the collaborative traversal on the training image.
In the embodiment of the present application, the cooperative traversal refers to that the CPU and the GPU simultaneously traverse the same layer of the graph.
The first correspondence, the second correspondence, the third correspondence, and the fourth correspondence are related to hardware configurations of the first processing apparatus 101 and the second processing apparatus 102 in the data processing system 10, and the first processing apparatus 101 and the second processing apparatus 102 are trained through a training diagram to obtain the correspondence.
It is to be noted that when the number of heterogeneous devices used for graph calculation in the data processing system is N, the number of correspondences is N2
Step S102: the control device 103 obtains layer characteristic parameters of a layer to be traversed.
In this step, the layer characteristic parameters include an edge number E _ r of the layer to be traversed, a point number V _ c of the layer to be traversed, a large point number Vh _ r of the layer to be traversed, an edge number E _ c of the traversed layer adjacent to the layer to be traversed, a point number V _ r of the traversed layer adjacent to the layer to be traversed, and a large point number Vh _ c of the traversed layer adjacent to the layer to be traversed.
For example, the graph to be processed is the graph shown in fig. 1, and the data processing system 10 is to traverse the layer 4 in the graph shown in fig. 1, assuming that the number of points is greater than or equal to 4 and is a large point, and the traversal direction is from top to bottom, and all the layer 1 to layer 3 of the graph have been traversed. At this time, the edge number E _ r of the layer to be traversed (i.e., the 4 th layer) is 13, the point number V _ c of the layer to be traversed is 3, the large point number Vh _ r of the layer to be traversed is 2, the edge number E _ c of the traversed layer adjacent to the layer to be traversed (i.e., the 3 rd layer) is 6, the point number V _ r of the traversed layer adjacent to the layer to be traversed is 5, and the large point number Vh _ c of the traversed layer adjacent to the layer to be traversed is 3.
Step S103: the control device 103 determines a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters.
In this step, the determined traversal algorithm strategy includes traversal algorithms respectively used when the first processing device 101 and the second processing device 102 cooperatively traverse the layer to be traversed, the determined traversal step strategy includes traversal steps respectively used when the first processing device 101 and the second processing device 102 cooperatively traverse the layer to be traversed, the performance of cooperative traversal performed by the first processing device 101 and the second processing device 102 respectively using the traversal algorithm strategy is the maximum, and the total overhead time of cooperative traversal performed by the first processing device 101 and the second processing device 102 respectively using the traversal step strategy is the minimum.
It is to be noted that, in this embodiment, the traversal algorithm policy is a traversal algorithm used when the CPU and the GPU cooperatively traverse the layer 3 graph data, and the traversal algorithms Top-Down algorithm and Bottom-Up algorithm are taken as examples, so there are the following 4 optional traversal algorithm policies:
traversal algorithm strategy 1: both the CPU and GPU use the Top-Down algorithm.
Traversal algorithm strategy 2: the CPU uses a Top-Down algorithm, and the GPU uses a Bottom-Up algorithm.
Traversal algorithm strategy 3: the CPU uses a Bottom-Up algorithm, and the GPU uses a Top-Down algorithm.
Traversal algorithm strategy 4: both the CPU and the GPU use the Bottom-Up algorithm.
The control device 103 selects an optimal traversal algorithm strategy from the 4 traversal algorithm strategies according to the layer characteristic parameters, the first corresponding relation, the second corresponding relation, the third processing relation and the fourth corresponding relation.
Specifically, please further refer to fig. 5, fig. 5 is a sub-flowchart of an image layer traversal method according to an embodiment of the present application, and as shown in fig. 5, step S103 includes the following sub-processes:
step S1031: the control device 103 obtains a first processing performance P11 according to the layer characteristic parameters and f1(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r), obtains a second processing performance P12 according to the layer characteristic parameters and a second correspondence f2(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r), obtains a third processing performance P21 according to the layer characteristic parameters and a third correspondence f3(V _ c, E _ c, V _ r, E _ r, Vh _ c, Vh _ r), and obtains a fourth processing performance P22 according to the layer characteristic parameters and a fourth correspondence f4(V _ c, E _ c, V _ r, E _ r, h _ vc, Vh _ r).
Step S1032: the control device 103 acquires the cooperative processing performance under different traversal algorithm strategies according to the first processing performance P11, the second processing performance P12, the third processing performance P21 and the fourth processing performance P22, selects the cooperative processing performance with the maximum value from the acquired cooperative processing performances, and takes the traversal algorithm strategy adopted by the cooperative processing performance with the maximum value as the traversal algorithm strategy of the current layer.
In some examples, the control device 103 obtains the co-processing performances Y1, Y2, Y3, and Y4 under different traversal algorithm strategies according to the following equations, respectively:
Y1=(1-a)*P11+(1-b)*P21;
Y2=(1-a)*P11+(1-b)*P22
Y3=(1-a)*P12+(1-b)*P21;
Y4=(1-a)*P12+(1-b)*P22;
where Y1 corresponds to traversal algorithm policy 1, Y2 corresponds to traversal algorithm policy 2, Y3 corresponds to traversal algorithm policy 3, and Y3 corresponds to traversal algorithm policy 4.
Also, a is the percentage of performance degradation over the cooperative traversal of the first processing device 101 and the second processing device 102, and b is the percentage of performance degradation over the cooperative traversal of the second processing device 102 and the first processing device 101, in some examples, a and b may be estimated values, for example, a is 10%, b is 5%, i.e., the CPU performs 10% over the cooperative traversal with the GPU, and the GPU performs 5% over the cooperative traversal with the CPU.
In other examples, a and b may be obtained by the following equations, specifically,
a1=(P11-Pcorun1)/P11;
a2=(P12-Pcorun1)/P12;
b1=(P21-Pcorun2)/P21;
b2=(P22-Pcorun2)/P22;
wherein a is a1 or a2, b is b1 or b2, Pcorun1 is an edge traversed by the CPU and the GPU in a cooperative pass per second, Pcorun2 is an edge traversed by the CPU and the GPU in a cooperative pass per second, the control device 103 can measure Pcorun1 and Pcorun2 when training is performed by using the training diagram, and when a and b are obtained by training, the performances are obtained according to the following equations:
Y1=(1-a1)*P11+(1-b1)*P21;
Y2=(1-a1)*P11+(1-b2)*P22
Y3=(1-a2)*P12+(1-b1)*P21;
Y4=(1-a2)*P12+(1-b2)*P22。
optionally, when operating on the above three sets of equations, the following constraints may be further set to further improve the accuracy of the operation:
P11<PcpuMAX;
P12<PcpuMAX;
P21<PgpuMAX;
P22<PgpuMAX;
P11+P21<PMAX;
P11+P22<PMAX;
P12+P21<PMAX;
P12+P22<PMAX;
the PcpuMAX is the maximum performance of the CPU when running alone, the PgpuMAX is the maximum performance of the GPU when running alone, and the PMAX is the maximum performance of the CPU and the GPU when performing cooperative traversal, which can be obtained by training, and when the control device 103 performs training using a training diagram, the PcpuMAX, the PgpuMAX, and the PMAX can be measured.
Further, the control device 103 selects the one having the maximum value among Y1, Y2, Y3, and Y4, and sets the traversal algorithm policy corresponding to the maximum value as the traversal algorithm policy.
For example, assuming Y2 is the maximum value, the traversal algorithm policy is: the CPU uses a Top-Down algorithm, and the GPU uses a Bottom-Up algorithm.
Step S1033: the control device 103 obtains total overhead time under different traversal step strategies on the layer to be traversed according to the traversal algorithm strategy, selects a traversal step adopted by the first processing device and a traversal step adopted by the second processing device when the total overhead time is the minimum value, and takes the traversal step adopted by the first processing device and the traversal step adopted by the second processing device when the total overhead time is the minimum value as the traversal step strategies.
For example, the control device 103 obtains the total overhead time T according to the following equation:
T=Ncpu*(Scpu/Pcpu+Tcpu)+Ngpu*(Sgpu/Pgpu+Tgpu)+Toverlap;
ncpu is the number of traversal steps of the first processing device 101 for a layer to be traversed, Scpu is the traversal step size of the first processing device 101 for the layer to be traversed, Ngpu is the number of traversal steps of the second processing device 102 for the layer to be traversed, Sgpu is the traversal step size of the second processing device 102 for the layer to be traversed, Tcpu is the preparation time of the first processing device 101, Tgpu is the preparation time of the second processing device 102, Pcpu is the processing performance of the first processing device 101 under the traversal algorithm policy, Pgpu is the processing performance of the second processing device 102 under the traversal algorithm policy, and Toverlap is the overlapping time when the first processing device 101 and the second processing device 102 cooperatively traverse.
Tcpu and Tgpu can be obtained through measurement, and there are various combinations of Ncpu and Scpu, and Ngpu and Sgpu, in this embodiment of the present application, the control device 103 substitutes the various combinations of Ncpu and Scpu, and Ngpu and Sgpu into the above equations respectively for operation, and when T is minimum, the corresponding Scpu and Sgpu are traversal step lengths, and the traversal step length policy includes Scpu and Sgpu.
Specifically, when Y2 is the maximum value, the traversal algorithm policy is: when the CPU uses Top-Down algorithm and the GPU uses Bottom-Up algorithm, then Pcpu is P11 and Pgpu is P22.
Toverlap is MAX (Scpu/Pcpu, Sgpu/Pgpu), and Toverlap indicates the longest time required for the CPU and the GPU to process data of the last step, respectively. In the process of cooperative traversal, the CPU and the GPU acquire a point of a layer to be traversed from a shared queue, when the point is traversed, the point is marked as traversed, the last step size is possibly executed by the CPU or the GPU, and the longest time required by the CPU and the GPU is used as the overlapping time in the process of estimating the total overhead time, so that the last step size can have enough time to complete in the overlapping time under any condition.
Optionally, when the above equation is operated, the following constraint conditions may be further set to further improve the accuracy of the operation:
(1) ncpu + Ngpu Sgpu is more than or equal to the corresponding data volume of the layer to be traversed;
(2)Ncpu*(Scpu/Pcpu+Tcpu)-Ngpu*(Sgpu/Pgpu+Tgpu)≈0;
in the constraint condition (1), V _ c is the total number of points of the layer 4 graph data of the graph to be traversed, and the constraint condition (1) means that the CPU and the GPU can cooperatively traverse all the points of the layer 4 graph data.
The constraint condition (2) means that the total processing time of executing the distributed traversal tasks respectively is basically the same when the CPU and the GPU cooperatively traverse the data of the 4 th-layer graph.
Step S104: the control device 103 notifies the first processing device 101 and the second processing device 102 to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
In this step, the control device 103 sends notification information including a traversal algorithm policy and a traversal step policy to the first processing device 101 and the second processing device 10, respectively, and the first processing device 101 and the second processing device 10 cooperatively traverse the layer to be traversed according to the traversal algorithm policy and the traversal step policy in the received notification information, respectively.
Specifically, the control device 103 sends first notification information to the CPU, where the first notification information includes that an algorithm adopted by the CPU is a Top-Down algorithm, and a traversal step size is Scpu, so that the CPU traverses the layer 4 graph by using the Top-Down algorithm, and the traversal step size is Scpu; the control device 103 further sends second notification information to the GPU, where the second notification information includes that the algorithm adopted by the GPU is a Bottom-Up algorithm, and the traversal step is Sgpu.
In other examples, the first notification information and the second notification information may be the same notification information, where the notification information includes that the algorithm used by the CPU is Top-Down algorithm, the traversal step is Scpu, and the algorithm used by the GPU is Bottom-Up algorithm, and the traversal step is Sgpu.
For ease of understanding, referring to FIGS. 6-8, assuming that Scpu is the amount of data accessing one data point at a time and Sgpu is the amount of data accessing one data point at a time, the CPU and the GPU cooperatively access points {7}, {8} and {9} of layers to be traversed in the shared queue, FIG. 6 shows the process of accessing {7} by the CPU with the Top-Down algorithm, FIG. 7 shows the process of accessing {8} by the GPU with the Bottom-Up algorithm at the same time, and FIG. 8 shows the process of accessing {9} by the CPU with the Top-Down algorithm.
It is to be noted that, in this embodiment, a traversal strategy and a traversal step length may be selected once in each layer, and in other embodiments of the present application, in order to ensure that traversal is performed in the fastest direction, when performing traversal algorithm selection of the CPU and the GPU, the traversal algorithm selection of the traversal strategy and the traversal step length may be performed every certain number of layers, for example, every 5 layers, and prediction time may be effectively saved at the cost of slightly decreasing prediction accuracy by a certain number of layers.
In summary, compared with the conventional graph traversal algorithm, in the embodiment of the present application, the performance difference between the first processing device and the second processing device is considered, and the difference between different layers of different traversal algorithms in the graph traversal process is also considered, for each layer of graph data, the first processing device and the second processing device can simultaneously perform the graph traversal, and also can select an optimal traversal step size for the first processing device and the second processing device, so that the problem that each data processing system in the prior art can only fix one step size and is not efficient enough is solved, and the multiple data processing systems can simultaneously and efficiently perform the graph traversal operation.
It should be noted that the present application is not limited to be applied to graph traversal based on breadth-first search, and the present application can be generalized to other algorithms based on graphs or sparse matrices, such as PageRank (PR) and sparse matrix vector multiplication (SpMV).
It is noted that the main body of the above method may also be the control unit 103 'shown in fig. 3b, or the control unit 103' shown in fig. 3 c. '
Referring to fig. 9, fig. 9 is a schematic structural diagram of a control device according to an embodiment of the present application, and as shown in fig. 9, the control device 103 specifically includes:
a layer characteristic parameter obtaining module 1031, configured to obtain layer characteristic parameters of a layer to be traversed;
the strategy confirming module 1032 is configured to determine a traversal algorithm strategy and a traversal step size strategy according to the layer feature parameters, where the traversal algorithm strategy includes traversal algorithms respectively used when the first processing device 101 and the second processing device 102 cooperatively traverse the layer to be traversed, the traversal step size strategy includes traversal step sizes respectively used when the first processing device 101 and the second processing device 102 cooperatively traverse the layer to be traversed, the cooperative processing performance corresponding to the traversal algorithm strategy is maximum, and the total overhead time corresponding to the traversal step size strategy is minimum;
a notifying module 1033, configured to notify the first processing device 101 and the second processing device 102 to cooperatively traverse the layer to be traversed according to the traversal algorithm policy and the traversal step policy.
Optionally, the control device 103 further comprises a training module 1034 for:
performing collaborative traversal training on the first processing device 101 and the second processing device 102 by using a training graph, acquiring a first corresponding relation between the layer characteristic parameters and the first processing performance, acquiring a second corresponding relation between the layer characteristic parameters and the second processing performance, acquiring a third corresponding relation between the layer characteristic parameters and the third processing performance, and acquiring a fourth corresponding relation between the layer characteristic parameters and the fourth processing performance; the policy confirmation module 1032 is specifically configured to determine a traversal algorithm policy according to the layer feature parameter, the first corresponding relationship, the second corresponding relationship, the third processing relationship, and the fourth corresponding relationship.
The first processing performance is the processing performance of the first processing device 101 running the first traversal algorithm to perform collaborative traversal on the training image, the second processing performance is the processing performance of the first processing device 101 running the second traversal algorithm to perform collaborative traversal on the training image, the third processing performance is the processing performance of the second processing device 102 running the first traversal algorithm to perform collaborative traversal on the training image, and the fourth processing performance is the processing performance of the second processing device 102 running the second traversal algorithm to perform collaborative traversal on the training image;
optionally, the policy validation module 1032 is specifically configured to:
acquiring a first processing performance according to the layer characteristic parameter and the first corresponding relation, acquiring a second processing performance according to the layer characteristic parameter and the second corresponding relation, acquiring a third processing performance according to the layer characteristic parameter and the third corresponding relation, and acquiring a fourth processing performance according to the layer characteristic parameter and the fourth corresponding relation; acquiring cooperative processing performances Y1, Y2, Y3 and Y4 under different traversal algorithm strategies according to the following equations respectively:
Y1=(1-a)*P11+(1-b)*P21;
Y2=(1-a)*P11+(1-b)*P22
Y3=(1-a)*P12+(1-b)*P21;
Y4=(1-a)*P12+(1-b)*P22;
wherein P11 is the first processing performance, P12 is the second processing performance, P21 is the third processing performance, P22 is the fourth processing performance, a is the percentage of performance degradation when the first processing device 101 and the second processing device 102 work in cooperation, b is the percentage of performance degradation when the second processing device 102 and the first processing device 101 work in cooperation;
the cooperative processing performance having the maximum value is selected from Y1, Y2, Y3, and Y4, and the traversal algorithm policy adopted for the cooperative processing performance of the maximum value is taken as the traversal algorithm policy.
Optionally, the policy validation module 1032 is specifically configured to:
the total overhead time T is obtained according to the following equation:
T=Ncpu*(Scpu/Pcpu+Tcpu)+Ngpu*(Sgpu/Pgpu+Tgpu)+Toverlap;
ncpu is the traversal step number of the first processing device 101 for the layer to be traversed, Scpu is the traversal step length of the first processing device 101 for the layer to be traversed, Ngpu is the traversal step number of the second processing device 102 for the layer to be traversed, Sgpu is the traversal step length of the second processing device 102 for the layer to be traversed, Tcpu is the preparation time of the first processing device 101, Tgpu is the preparation time of the second processing device 102, Pcpu is the processing performance of the first processing device 101 under the traversal algorithm policy, Pgpu is the processing performance of the second processing device 102 under the traversal algorithm policy, and Toverlap is the overlapping time when the first processing device 101 and the second processing device 102 work cooperatively;
and taking the traversal step length Scpu adopted by the first processing device 101 and the traversal step length Sgpu adopted by the second processing device 102 when T is the minimum value as the traversal step length strategy.
Optionally, the layer feature parameters include the number of edges of the layer to be traversed, the number of points of the layer to be traversed, the number of large points of the layer to be traversed, the number of edges of the traversed layer adjacent to the layer to be traversed, the number of points of the traversed layer adjacent to the layer to be traversed, and the number of large points of the traversed layer adjacent to the layer to be traversed.
Optionally, the first traversal algorithm is one of a Top-Down algorithm and a Bottom-Up algorithm, and the second traversal algorithm is the other of the Top-Down algorithm and the Bottom-Up algorithm.
Optionally, the first processing device 101 is one of a CPU, a GPU, an FPGA and an ASIC, and the second processing device 102 is another one of the CPU, the GPU, the FPGA and the ASIC.
Referring to fig. 10, fig. 10 is a schematic structural diagram of another control device according to an embodiment of the present disclosure. As shown in fig. 10, the control device 103 includes a processor 105, a memory 106 and a bus 107, the processor 105 and the memory 106 are respectively connected to the bus 107, the memory 106 stores program instructions, and the processor executes the program instructions to perform the steps described in fig. 2, fig. 3a, fig. 3b and fig. 3c and the corresponding embodiments thereof.
Referring to fig. 11, fig. 11 is a schematic diagram of an apparatus structure of a data processing chip according to an embodiment of the present disclosure. As shown in fig. 11, the data processing chip includes a first processor 108, a second processor 109, a third processor 110, and a bus 108, and the first processor 108, the second processor 109, and the third processor 110 are respectively connected to the bus 108, wherein the first processor 108 and the second processor 109 are heterogeneous devices, and the third processor 110 is used for implementing the above-mentioned related functions of the control device 103 to control the first processor 108 and the second processor 109 to cooperatively traverse the layers.
In summary, the present application implements cooperative traversal of the same layer of the graph by multiple heterogeneous processing devices in a data processing system, and before the cooperative traversal, the algorithm and step length of each heterogeneous processing device may be dynamically adjusted according to the layer characteristic parameter of the layer to be traversed, thereby implementing efficient execution of graph traversal operation by multiple processing devices at the same time, and ensuring minimum total overhead time of the cooperative traversal under the condition of controlling the overall processing performance of the multiple processing devices to be optimal.
It should be noted that the processor 105 in the embodiment of the present application may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of the CPU and the NP. The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
It should be noted that, in the embodiment of the present application, the processor 106 may include a volatile memory (a volatile memory), such as a random-access memory (RAM); the processor 106 may also include a non-volatile memory (ROM), such as a read-only memory (ROM), a flash memory (HDD), a hard disk (HDD) or a solid-state drive (SSD); the processor 106 may also include a combination of memories of the sort described above.
Based on the same inventive concept, the present application provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method described in any possible implementation of the above embodiments.
Based on the same inventive concept, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as described in any of the possible implementations of the embodiments described above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product which is embodied on one or more computer-usable storage channels (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (18)

1. An image layer traversal method is used for big data graph calculation, and comprises the following steps:
obtaining layer characteristic parameters of a layer to be traversed;
determining a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters, wherein the traversal algorithm strategy comprises traversal algorithms respectively used when a first processing device and a second processing device cooperatively traverse the layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the performance of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal algorithm strategy is the maximum, and the total overhead time of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal step strategy is the minimum;
and informing the first processing device and the second processing device to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
2. The method according to claim 1, wherein before obtaining layer feature parameters of a layer to be traversed, the method further comprises:
performing collaborative traversal training on the first processing device and the second processing device by using a training graph, acquiring a first corresponding relation between the layer characteristic parameters and first processing performance, acquiring a second corresponding relation between the layer characteristic parameters and second processing performance, acquiring a third corresponding relation between the layer characteristic parameters and third processing performance, and acquiring a fourth corresponding relation between the layer characteristic parameters and fourth processing performance;
the first processing performance is the processing performance of the first processing device running a first traversal algorithm to perform collaborative traversal on the training image, the second processing performance is the processing performance of the first processing device running a second traversal algorithm to perform collaborative traversal on the training image, the third processing performance is the processing performance of the second processing device running the first traversal algorithm to perform collaborative traversal on the training image, and the fourth processing performance is the processing performance of the second processing device running the second traversal algorithm to perform collaborative traversal on the training image;
the determining the traversal algorithm strategy according to the layer characteristic parameters specifically includes:
and determining the traversal algorithm strategy according to the layer characteristic parameters, the first corresponding relation, the second corresponding relation, the third corresponding relation and the fourth corresponding relation.
3. The method according to claim 2, wherein the determining the traversal algorithm policy according to the layer feature parameter, the first corresponding relationship, the second corresponding relationship, the third corresponding relationship, and the fourth corresponding relationship specifically includes:
acquiring the first processing performance according to the layer characteristic parameter and the first corresponding relation, acquiring the second processing performance according to the layer characteristic parameter and the second corresponding relation, acquiring the third processing performance according to the layer characteristic parameter and the third corresponding relation, and acquiring the fourth processing performance according to the layer characteristic parameter and the fourth corresponding relation;
and acquiring the cooperative processing performance under different traversal algorithm strategies according to the first processing performance, the second processing performance, the third processing performance and the fourth processing performance, and taking the traversal algorithm strategy adopted by the cooperative processing performance with the maximum value as the traversal algorithm strategy.
4. The method according to claim 3, wherein the obtaining of the cooperative processing performance under different traversal algorithm policies according to the first processing performance, the second processing performance, the third processing performance, and the fourth processing performance, and using a traversal algorithm policy adopted by the cooperative processing performance with the maximum value as the traversal algorithm policy specifically includes:
acquiring cooperative processing performances Y1, Y2, Y3 and Y4 under different traversal algorithm strategies according to the following equations respectively:
Y1=(1-a)*P11+(1-b)*P21;
Y2=(1-a)*P11+(1-b)*P22
Y3=(1-a)*P12+(1-b)*P21;
Y4=(1-a)*P12+(1-b)*P22;
wherein P11 is the first processing capability, P12 is the second processing capability, P21 is the third processing capability, P22 is the fourth processing capability, a is the percentage of performance degradation when the first processing device and the second processing device are cooperatively operating, b is the percentage of performance degradation when the second processing device and the first processing device are cooperatively operating;
selecting the co-processing performance with the maximum value among the Y1, the Y2, the Y3, and the Y4, and taking a traversal algorithm policy adopted to obtain the co-processing performance with the maximum value as the traversal algorithm policy.
5. The method according to claim 3 or 4, wherein the determining the traversal step size policy according to the layer feature parameter specifically includes:
and acquiring total overhead time for executing different traversal step strategies on the layer to be traversed according to the traversal algorithm strategy, and taking the traversal step adopted by the first processing device and the traversal step adopted by the second processing device as the traversal step strategies when the total overhead time is the minimum value.
6. The method according to claim 5, wherein the obtaining, according to the traversal algorithm policy, total overhead time for executing different traversal step strategies on the layer to be traversed, and taking a traversal step adopted by the first processing device and a traversal step adopted by the second processing device when the total overhead time is a minimum value as the traversal step strategies specifically includes:
the total overhead time T is obtained according to the following equation:
T=Ncpu*(Scpu/Pcpu+Tcpu)+Ngpu*(Sgpu/Pgpu+Tgpu)+Toverlap;
ncpu is the traversal step number of the first processing device for the layer to be traversed, Scpu is the traversal step length of the first processing device for the layer to be traversed, Ngpu is the traversal step number of the second processing device for the layer to be traversed, Sgpu is the traversal step length of the second processing device for the layer to be traversed, Tcpu is the preparation time of the first processing device, Tgpu is the preparation time of the second processing device, Pcpu is the processing performance of the first processing device under the traversal algorithm strategy, Pgpu is the processing performance of the second processing device under the traversal algorithm strategy, and Toverlap is the overlapping time of the cooperative traversal of the first processing device and the second processing device;
and taking the traversal step length Scpu adopted by the first processing device and the traversal step length Sgpu adopted by the second processing device when T is the minimum value as the traversal step length strategy.
7. The method according to any one of claims 1 to 4, wherein the layer characteristic parameters include any one or a combination of the following parameters: the number of the edges of the layer to be traversed, the number of the points of the layer to be traversed, the number of the large points of the layer to be traversed, the number of the edges of the traversed layer adjacent to the layer to be traversed, the number of the points of the traversed layer adjacent to the layer to be traversed, and the number of the large points of the traversed layer adjacent to the layer to be traversed.
8. The method of any one of claims 2 to 4, wherein the first traversal algorithm is one of a Top-Down algorithm and a Bottom-Up algorithm, and the second traversal algorithm is the other of the Top-Down algorithm and the Bottom-Up algorithm.
9. A control apparatus for big data map calculation, comprising:
the layer characteristic parameter acquisition module is used for acquiring layer characteristic parameters of a layer to be traversed;
the strategy confirming module is used for determining a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters, wherein the traversal algorithm strategy comprises traversal algorithms respectively used when a first processing device and a second processing device cooperatively traverse the layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the performance of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal algorithm strategy is the maximum, and the total overhead time of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal step strategy is the minimum;
and the notification module is used for notifying the first processing device and the second processing device to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
10. The control device of claim 9, further comprising a training module to:
performing collaborative traversal training on the first processing device and the second processing device by using a training graph, acquiring a first corresponding relation between the layer characteristic parameters and first processing performance, acquiring a second corresponding relation between the layer characteristic parameters and second processing performance, acquiring a third corresponding relation between the layer characteristic parameters and third processing performance, and acquiring a fourth corresponding relation between the layer characteristic parameters and fourth processing performance;
the first processing performance is the processing performance of the first processing device running a first traversal algorithm to perform collaborative traversal on the training image, the second processing performance is the processing performance of the first processing device running a second traversal algorithm to perform collaborative traversal on the training image, the third processing performance is the processing performance of the second processing device running the first traversal algorithm to perform collaborative traversal on the training image, and the fourth processing performance is the processing performance of the second processing device running the second traversal algorithm to perform collaborative traversal on the training image;
the policy confirmation module is specifically configured to determine the traversal algorithm policy according to the layer feature parameter, the first corresponding relationship, the second corresponding relationship, the third corresponding relationship, and the fourth corresponding relationship.
11. The control device according to claim 10, wherein the policy validation module is specifically configured to:
acquiring the first processing performance according to the layer characteristic parameter and the first corresponding relation, acquiring the second processing performance according to the layer characteristic parameter and the second corresponding relation, acquiring the third processing performance according to the layer characteristic parameter and the third corresponding relation, and acquiring the fourth processing performance according to the layer characteristic parameter and the fourth corresponding relation;
and acquiring the cooperative processing performance under different traversal algorithm strategies according to the first processing performance, the second processing performance, the third processing performance and the fourth processing performance, and taking the traversal algorithm strategy adopted by the cooperative processing performance with the maximum value as the traversal algorithm strategy.
12. The control device according to claim 11, wherein the policy validation module is specifically configured to:
acquiring cooperative processing performances Y1, Y2, Y3 and Y4 under different traversal algorithm strategies according to the following equations respectively:
Y1=(1-a)*P11+(1-b)*P21;
Y2=(1-a)*P11+(1-b)*P22
Y3=(1-a)*P12+(1-b)*P21;
Y4=(1-a)*P12+(1-b)*P22;
wherein P11 is the first processing capability, P12 is the second processing capability, P21 is the third processing capability, P22 is the fourth processing capability, a is the percentage of performance degradation when the first processing device and the second processing device are cooperatively operating, b is the percentage of performance degradation when the second processing device and the first processing device are cooperatively operating;
selecting the co-processing performance with the maximum value from the Y1, the Y2, the Y3, and the Y4, and using a traversal algorithm policy adopted for the co-processing performance with the maximum value as the traversal algorithm policy.
13. The control device according to claim 11 or 12, wherein the policy validation module is specifically configured to:
and acquiring total overhead time for executing different traversal step strategies on the layer to be traversed according to the traversal algorithm strategy, and taking the traversal step adopted by the first processing device and the traversal step adopted by the second processing device as the traversal step strategies when the total overhead time is the minimum value.
14. The control device according to claim 13, wherein the policy validation module is specifically configured to:
the total overhead time T is obtained according to the following equation:
T=Ncpu*(Scpu/Pcpu+Tcpu)+Ngpu*(Sgpu/Pgpu+Tgpu)+Toverlap;
ncpu is the traversal step number of the first processing device for the layer to be traversed, Scpu is the traversal step length of the first processing device for the layer to be traversed, Ngpu is the traversal step number of the second processing device for the layer to be traversed, Sgpu is the traversal step length of the second processing device for the layer to be traversed, Tcpu is the preparation time of the first processing device, Tgpu is the preparation time of the second processing device, Pcpu is the processing performance of the first processing device under the traversal algorithm strategy, Pgpu is the processing performance of the second processing device under the traversal algorithm strategy, and Toverlap is the overlapping time of the cooperative traversal of the first processing device and the second processing device;
and taking the traversal step length Scpu adopted by the first processing device and the traversal step length Sgpu adopted by the second processing device when T is the minimum value as the traversal step length strategy.
15. The control device according to any one of claims 9 to 12, wherein the layer characteristic parameter includes any one of the following parameters or a combination thereof: the number of the edges of the layer to be traversed, the number of the points of the layer to be traversed, the number of the large points of the layer to be traversed, the number of the edges of the traversed layer adjacent to the layer to be traversed, the number of the points of the traversed layer adjacent to the layer to be traversed, and the number of the large points of the traversed layer adjacent to the layer to be traversed.
16. The control apparatus according to any one of claims 10 to 12, wherein the first traversal algorithm is one of a Top-Down algorithm and a Bottom-Up algorithm, and the second traversal algorithm is the other of the Top-Down algorithm and the Bottom-Up algorithm.
17. A control device for big data map calculation in a data processing system, wherein the control device is configured to control a first processing device to perform map calculation in cooperation with a second processing device, and the control device comprises a processor and a memory, the memory stores program instructions, and the processor executes the program instructions to perform the following steps:
obtaining layer characteristic parameters of a layer to be traversed;
determining a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters, wherein the traversal algorithm strategy comprises traversal algorithms respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the performance of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal algorithm strategy is the maximum, and the total overhead time of cooperative traversal performed by the first processing device and the second processing device respectively adopting the traversal step strategy is the minimum;
and informing the first processing device and the second processing device to cooperatively traverse the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
18. A data processing system is characterized by comprising a first processing device, a second processing device and a control device, wherein the control device is used for controlling the first processing device and the second processing device to cooperatively perform graph calculation, and the first processing device and the second processing device respectively perform graph calculation according to instructions of the control device;
the control device is used for acquiring layer characteristic parameters of a layer to be traversed, determining a traversal algorithm strategy and a traversal step strategy according to the layer characteristic parameters, and sending notification information including the traversal algorithm strategy and the traversal step strategy to the first processing device and the second processing device respectively, wherein, the traversal algorithm strategy comprises traversal algorithms respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the traversal step strategy comprises traversal steps respectively used when the first processing device and the second processing device cooperatively traverse the layer to be traversed, the first processing device and the second processing device respectively adopt the traversal algorithm strategy to carry out collaborative traversal with the maximum performance, the total overhead time of the first processing device and the second processing device which adopt the traversal step strategy to carry out collaborative traversal is minimum;
and the first processing device and the second processing device are respectively used for cooperatively traversing the layer to be traversed according to the traversal algorithm strategy and the traversal step strategy.
CN201710687341.4A 2017-08-11 2017-08-11 Layer traversal method, control device and data processing system Active CN109388428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710687341.4A CN109388428B (en) 2017-08-11 2017-08-11 Layer traversal method, control device and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710687341.4A CN109388428B (en) 2017-08-11 2017-08-11 Layer traversal method, control device and data processing system

Publications (2)

Publication Number Publication Date
CN109388428A CN109388428A (en) 2019-02-26
CN109388428B true CN109388428B (en) 2021-05-04

Family

ID=65414454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710687341.4A Active CN109388428B (en) 2017-08-11 2017-08-11 Layer traversal method, control device and data processing system

Country Status (1)

Country Link
CN (1) CN109388428B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113867798A (en) * 2020-06-30 2021-12-31 上海寒武纪信息科技有限公司 Integrated computing device, integrated circuit chip, board card and computing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN105468439A (en) * 2015-11-19 2016-04-06 华东师范大学 Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
CN106951322A (en) * 2017-02-28 2017-07-14 中国科学院深圳先进技术研究院 The image collaboration processing routine acquisition methods and system of a kind of CPU/GPU isomerous environments

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706741B (en) * 2009-12-11 2012-10-24 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance
US8789026B2 (en) * 2011-08-02 2014-07-22 International Business Machines Corporation Technique for compiling and running high-level programs on heterogeneous computers
KR101970041B1 (en) * 2012-09-07 2019-04-18 카네기 멜론 유니버시티 Methods for Hybrid GPU/CPU Data Processing
US9501860B2 (en) * 2014-01-03 2016-11-22 Intel Corporation Sparse rasterization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN105468439A (en) * 2015-11-19 2016-04-06 华东师范大学 Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
CN106951322A (en) * 2017-02-28 2017-07-14 中国科学院深圳先进技术研究院 The image collaboration processing routine acquisition methods and system of a kind of CPU/GPU isomerous environments

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures;zhai jidong, et;《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》;20170331;全文 *
基于CUPTI接口的典型GPU程序负载特征分析;郑祯,翟季冬等;《计算机研究与发展》;20160615;全文 *

Also Published As

Publication number Publication date
CN109388428A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN108701250B (en) Data fixed-point method and device
WO2020108371A1 (en) Partitioning of deep learning inference with dynamic offloading
KR20200088475A (en) Simultaneous training of functional networks of neural networks
WO2018176385A1 (en) System and method for network slicing for service-oriented networks
EP3139270A1 (en) Data mining method and node
EP3663915A1 (en) Scheduling task graph operations
CN112513886B (en) Information processing method, information processing apparatus, and information processing program
WO2016123808A1 (en) Data processing system, calculation node and data processing method
KR101544457B1 (en) The method for parameter investigation to optimal design
CN109189572B (en) Resource estimation method and system, electronic equipment and storage medium
CN114707114A (en) Blocking method and device, convolution operation method and device, and storage medium
WO2022001014A1 (en) Neural network model compilation method and apparatus, storage medium, and electronic device
CN111122222B (en) Sample point position determining method and system
JP2020107042A (en) Learning model generation device, learning model generation method, and program
CN109388428B (en) Layer traversal method, control device and data processing system
CN114816711A (en) Batch task processing method and device, computer equipment and storage medium
TW202001701A (en) Method for quantizing an image and method for training a neural network
KR102326586B1 (en) Method and apparatus for processing large-scale distributed matrix product
CN111027688A (en) Neural network calculator generation method and device based on FPGA
CN113778518B (en) Data processing method, device, computer equipment and storage medium
CN113485848B (en) Deep neural network deployment method and device, computer equipment and storage medium
WO2021248937A1 (en) Geographically distributed graph computing method and system based on differential privacy
CN110347511B (en) Geographic distributed process mapping method and device containing privacy constraint conditions and terminal
US11321819B2 (en) System and method for performing a convolution operation
JP6163926B2 (en) Virtual machine management apparatus, virtual machine management method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant