CN113238837B

CN113238837B - Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment

Info

Publication number: CN113238837B
Application number: CN202110433418.1A
Authority: CN
Inventors: 高鹏远
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2020-07-10
Filing date: 2021-04-21
Publication date: 2022-12-27
Anticipated expiration: 2041-04-21
Also published as: CN113238837A

Abstract

The embodiment of the application provides a method and a device for building a calculation flow chart and optimizing calculation efficiency and electronic equipment. The method for constructing the calculation flow chart comprises the following steps: acquiring each subtask of a target computing task, and distributing computing nodes for each subtask; setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart; and optimizing the first calculation flow chart to obtain an optimized second calculation flow chart. The method and the device can improve the calculation efficiency of the calculation flow chart.

Description

Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to a method and a device for building a calculation flow chart and optimizing calculation efficiency and electronic equipment.

Background

As machine learning techniques have matured, they have been widely used in various fields. Video or image processing by using a machine learning technology is generally performed under the conditions of high concurrency, high video flow path number and multiple service flows, the calculated amount is large, the service logic is complex, and performance optimization is required to be performed under the condition of certain hardware calculation power so as to improve the calculation efficiency. However, there is no scientific performance analysis tool, and performance analysis cannot be performed to find out the performance bottleneck and optimize the performance bottleneck.

In view of the above problems, no effective technical solution exists at present.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for constructing a calculation flow chart and optimizing calculation efficiency and electronic equipment, and the calculation efficiency can be improved.

In a first aspect, an embodiment of the present application provides a method for building a computation flow chart, including:

acquiring each subtask of a target computing task and distributing computing nodes for each subtask;

setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart;

and optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.

Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, the optimizing a connection relationship of the first computing flowchart includes:

and optimizing the wrong connection relation existing in the first calculation flow chart.

Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, the optimizing an incorrect connection relation existing in the first computing flowchart includes:

detecting whether the input end and the output end of each computing node of the first computing flow chart are connected with a data pipeline or not, and optimizing the first computing flow chart according to the detection result;

and/or detecting whether the input end and the output end of the data pipeline of the first calculation flow chart are both connected with other data pipelines or calculation nodes, and optimizing the first calculation flow chart according to the detection result;

and/or detecting whether a data pipeline without a set rule exists in the first calculation flow chart to obtain a third detection result, and optimizing the first calculation flow chart according to the detection result.

Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, performing optimization processing on a computation node of the first computation flowchart includes:

adjusting serial computing nodes without data dependency in the first computing flow chart into asynchronous parallel computing nodes;

and/or setting parameters of each computing node and data pipeline in the first computing flowchart;

and/or splitting the computing nodes meeting the splitting condition in each computing node in the first computing flowchart.

Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, the setting parameters of each computation node and data pipeline in the first computation flowchart includes:

and setting parameters for the maximum length of the data pipeline setting cache queue.

and setting parameters of the maximum batch processing amount of each computing node and a corresponding time-out mechanism.

Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, the splitting processing performed on a computation node that satisfies a splitting condition in each computation node in the first computation flowchart includes:

screening out target computing nodes from the computing nodes of the first computing flow chart; the subtask corresponding to the target computing node can be split into a plurality of subtasks;

splitting the target compute node into a plurality of new compute nodes;

and taking the input end of the target computing node as the input end of the plurality of new computing nodes, and taking the output end of the target computing node as the output end of the plurality of new computing nodes.

Optionally, splitting a computing node that satisfies a splitting condition in each computing node in the first computing flowchart, further including:

judging whether a target computing node exists in the plurality of new computing nodes or not;

if yes, screening out a target computing node from the plurality of new computing nodes, and returning to execute the step of splitting the target computing node into the plurality of new computing nodes;

if not, the splitting process is ended.

Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, setting a data pipeline between each computing node according to a dependency relationship between each subtask includes:

setting a data pipeline according to the data flow direction relation among the computing nodes;

setting the type of data pipe by at least one of:

when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; the first computing node is a node which transmits the data after the subtask is executed to a plurality of next-level computing nodes at the same time;

when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a selection condition, and the target data is data generated after the second computing node executes the subtasks;

when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks;

when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data.

In a second aspect, an embodiment of the present application further provides a method for optimizing computational efficiency, including the following steps:

acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods;

monitoring the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the computing flow chart;

determining a bottleneck computing node from each computing node, and optimizing the computing efficiency of the bottleneck computing node, wherein the step of optimizing the computing efficiency of the bottleneck computing node comprises at least one of the following steps:

when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;

when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node;

and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.

Optionally, in the method for optimizing computational efficiency according to the embodiment of the present application, the method further includes:

re-acquiring a calculation flow chart according to the adjusted maximum batch processing amount and/or the split bottleneck calculation node;

monitoring the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the computing flow chart; determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes until no optimizable bottleneck computing nodes exist.

In a third aspect, an embodiment of the present application further provides a device for constructing a computation flowchart, including:

the first acquisition module is used for acquiring each subtask of the target computing task and distributing computing nodes for each subtask;

the setting module is used for setting data pipelines among all the computing nodes according to the dependency relationship among all the subtasks to obtain a first computing flow chart;

and the optimization module is used for optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.

In a fourth aspect, an embodiment of the present application further provides a device for optimizing computational efficiency, including:

and the second acquisition module is used for acquiring a calculation flow chart, and the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods.

The monitoring module is used for monitoring the current computing load of each computing node in the computing flow chart in the process of applying the computing flow chart to compute and process the object to be processed of the target computing task;

an optimization module, configured to determine a bottleneck computing node from each computing node, and optimize the computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps: when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node; and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.

In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the electronic device executes any one of the methods described above.

As can be seen from the above, the method, the device and the electronic device for constructing the computing flowchart and optimizing the computing efficiency provided in the embodiment of the present application acquire each subtask of the target computing task and allocate a computing node to each subtask; setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart; optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization; so that the calculation efficiency can be improved.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a method for constructing a computing flowchart according to an embodiment of the present application.

Fig. 2 is a flowchart of a calculation efficiency optimization method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a calculation flowchart construction apparatus in an embodiment of the present application.

Fig. 4 is a schematic diagram of a calculation efficiency optimization device in the embodiment of the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Fig. 1 is a flowchart of a method for constructing a computing flowchart according to an embodiment of the present application. The method for constructing the calculation flow chart comprises the following steps:

s101, acquiring each subtask of a target computing task and distributing a computing node for each subtask;

s102, setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart;

s103, optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.

In order to understand the performance of the computing device, the embodiment of the invention provides a performance analysis tool to monitor the running condition of the SDK, so that when a problem occurs, the cause of the specific problem can be understood at the minimum cost. The embodiment of the invention provides a method for constructing a computation flow chart, wherein the computation flow chart is a Directed Acyclic Graph (DAG), and the method can be used for carrying out visual analysis on the time sequence change of a data backlog state based on the computation flow chart.

In step S101, the target computing task is divided into a plurality of subtasks, and a computing node is allocated to each subtask, where each subtask corresponds to one computing node. For example, in the target tracking calculation task, the target tracking calculation task may be divided into a plurality of subtasks, such as target detection and trajectory generation, and one calculation node for implementing target detection and one calculation node for trajectory generation may be correspondingly allocated.

In step S102, the data pipeline is used to transmit the target data output from the output terminal of the upper-stage computing node to the input terminal of the lower-stage computing node. The data pipeline is used for connecting the computing nodes with data transmission relationship, and can be divided into a common pipeline and a special pipeline according to the connection relationship among different computing nodes, wherein the special pipeline comprises a broadcast pipeline, a multi-branch pipeline, a merging pipeline, an order-preserving pipeline and the like. Setting a data pipeline according to the data flow direction relation among the computing nodes, wherein the type of the data pipeline can be set in at least one of the following modes:

the first method is as follows: when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; and the first computing node is a node which transmits the data after the subtasks are executed to a plurality of next-level computing nodes at the same time.

The second method comprises the following steps: when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a selection condition, and the target data is data generated after the second computing node executes the subtasks. The multi-branch pipeline selects one of a plurality of data output pipelines for data output according to the satisfied conditions, for example, the computing node 1 is connected with the computing node 2 and the computing node 3 of the next level through the multi-branch pipeline (a data pipeline a and a data pipeline b), and when the data generated by the computing node 1 executing the subtasks satisfies the condition a, the computing node 1 transmits the generated data to the computing node 2 through the data pipeline a; when the data generated by the computing node 1 executing the subtask satisfies the condition B, the computing node 1 transmits the generated data to the computing node 3 through the data pipe B.

The third method comprises the following steps: when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; and the third computing node is a node capable of receiving data after a plurality of upper-level computing nodes execute the subtasks. The merging pipeline is a plurality of data pipelines used when data generated by a plurality of computing nodes executing subtasks are output to the same next-level node.

The method is as follows: when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node of which the receiving sequence of the data and the output sequence of the data need to be consistent. The order-preserving pipeline refers to a data pipeline for adjusting the order of output data according to the order of input data of a computing node, for example, the data pipeline at the output end connected to the computing node 4 is the order-preserving pipeline, the computing node 4 is a node for executing an image detection task, the sequence of frame numbers of images received by the computing node when executing the image detection task is 12345, and if the sequence of frame numbers corresponding to image detection results output after the computing node 4 executes the image detection task is 21453, the order-preserving pipeline connected to the computing node 4 can correct the sequence of frame numbers corresponding to the image detection results to 12345.

However, in step S103, only the connection relationship may be optimized, or only the calculation node may be optimized, and naturally, the optimization effect is optimal for both the connection relationship and the calculation node.

Wherein, optimizing the connection relationship may include the following:

s1, optimizing the wrong connection relation existing in the first calculation flow chart.

In step S1, the following sub-steps may be specifically included: s11, detecting whether the input end and the output end of each computing node of the first computing flow chart are connected with a data pipeline or not, and optimizing the first computing flow chart according to the detection result; s12, detecting whether the input end and the output end of the data pipeline of the first calculation flow chart are both connected with other data pipelines or calculation nodes, and optimizing the first calculation flow chart according to the detection result; s13, detecting whether a data pipeline without a set rule exists in the first calculation flow chart to obtain a third detection result, and optimizing the first calculation flow chart according to the detection result; s14, detecting whether a situation that a data type corresponding to a data pipeline connected with an input end of a computing node is inconsistent with a type of input data corresponding to the computing node exists in the first computing flowchart, detecting whether a situation that a data type corresponding to a data pipeline connected with an output end of the computing node is inconsistent with a type of output data corresponding to the computing node exists in the first computing flowchart, and if so, adjusting and optimizing one of the data type of the data pipeline, the type of input data of the computing node and the type of output data of the computing node.

In operation S11, it is mainly detected whether there is a floating input or output of each computation node in the first computation flowchart (the data pipe is defined and registered but not used), and if there is a floating input or output, it indicates that the connection relationship is wrong, and optimization is needed. And during optimization, judging that the node with the suspended input end or output end belongs to redundant computing nodes, if not, adding corresponding data pipelines according to the dependency relationship between the computing node and other computing nodes, and connecting the computing node with the suspended input end or output end with other corresponding computing nodes. In operation S12, it is mainly detected that the data pipe is floating, and if one end of a certain data pipe is not connected to a data node, it indicates that the data pipe may be a redundant data pipe or a data pipe that is connected incorrectly. According to whether the computing node connected with one end of the data pipeline is connected with all the computing nodes with dependency relationship, if so, the data pipeline is a redundant data pipeline and is directly deleted, and if not, the suspended end of the data pipeline is connected with another computing node which has dependency relationship with the current computing node but does not establish connection. When the registered computing node or data pipe is a redundant computing node or data pipe, the computing node or data pipe may be deleted and the computing resources corresponding to the computing node or data pipe may be released. In operation S13, it is mainly detected whether a rule definition is set for a particular data pipe, for example, whether the demux data pipe sets which data (e.g., odd-numbered data) are transmitted to one compute node for attribute identification and which data (e.g., even-numbered data) are transmitted to another compute node for attribute identification. And if the special data pipeline without the set rule definition exists, setting the rule definition for the special data pipeline.

Steps S11 to S14 are actually security checks of the calculation flowchart, and the calculation flowchart is checked first, and then the problems found by the check are optimized. In this manner, the probability of problems in the development of new target computing tasks may be significantly reduced. The optimization processing of the computing nodes of the first computing flowchart may include any one or more of the following manners:

and S2, adjusting the serial computing nodes without data dependency in the first computing flow chart into asynchronous parallel computing nodes. And S3, setting parameters of each calculation node and each data pipeline in the first calculation flow chart. And S4, splitting the calculation nodes meeting the splitting condition in each calculation node in the first calculation flow chart.

For the operation S2, the serial computing nodes in the first computing flowchart without data dependency are adjusted to be asynchronous parallel computing nodes, so that the computing delay can be reduced and the computing efficiency can be improved.

For operation S3, parameter setting is performed on each computation node and data pipe in the first computation flowchart. At present, the mainstream computing hardware has certain batch processing optimization on the NN network reasoning side, and the computing efficiency can be improved by carrying out batch processing. Therefore, a caching mechanism for different computing data of the same computing task can be arranged in the target computing task, so that the cached data can be processed in batch. The data pipeline in the calculation flow chart corresponds to a certain storage resource, the capacity of caching data of the same task is naturally achieved, and when the calculation flow chart is constructed, parameters such as the maximum data size of the cache of the data pipeline, the maximum batch processing size allowed by the calculation node, data analysis delay and the like can be set, so that the maximum batch processing size can be determined subsequently. This step S3 may comprise the following sub-steps: s31, setting parameters for setting the maximum length of a cache queue of the data pipeline; and S32, setting parameters of the maximum batch processing amount of each computing node and a corresponding timeout mechanism. However, for step S3, only one of S31 and S32 is needed, and of course, it is better if both settings are adopted. In this way, the maximum length of the buffer queue can be set for the data pipeline to control the storage resource consumption and ensure the pressure feedback capacity for the upper-level computing node. The high-efficiency batch processing under the premise of ensuring the maximum data analysis delay is realized by setting the maximum batch processing amount of each computing node and a corresponding timeout mechanism, so that the data throughput of the computing nodes can be greatly improved.

Wherein, for the operation S4, the splitting condition is that the sub task corresponding to the computing node can be split into a plurality of parallel sub tasks. And splitting each splittable computing node to obtain a new computing node, and continuously splitting the splittable computing nodes in the new computing nodes until the splittable computing nodes cannot be split.

In some embodiments, this step S4 may comprise the following sub-steps: s41, screening out target computing nodes from the computing nodes of the first computing flow chart; the subtask corresponding to the target computing node can be split into a plurality of parallel subtasks; and S42, splitting the target computing node into a plurality of new computing nodes.

In some embodiments, after step S42, step S4 may further include the following sub-steps: s43, judging whether a target computing node exists in the plurality of new computing nodes; s44, if the target computing node exists, screening out the target computing node from the new computing nodes, and returning to execute the step of splitting the target computing node into the new computing nodes; and S45, if the current signal does not exist, ending the splitting process.

In step S41, all target computing nodes satisfying the splitting condition are screened, and in step S42, the plurality of computing nodes obtained by splitting may be connected to the previous node by using a broadcast pipe and connected to the next node by using a merge pipe. In step S43, the determination criterion is as shown in step S41, and as long as the subtask corresponding to the first computing node can be further divided into a plurality of subtasks, it indicates that the first computing node is the target computing node, and further division can be performed. In step S44, the plurality of split computing nodes may be connected to the previous node using a broadcast pipe and connected to the next node using a merge pipe.

Of course, it can be understood that when the new computing nodes formed by splitting are connected, the computing nodes without dependency relationship can also be processed in parallel, thereby improving the processing efficiency. Of course, it can be understood that in some embodiments, for a stateful computing node, it is necessary to ensure that a data pipeline connected to an input end is a multi-branch type pipeline, so as to ensure the correctness of the behavior of the stateful computing node when processing the multi-video stream time sequence dependent data, and for the stateful computing node, no split processing is required. A stateful node means that data input to the node is required, for example, the data input to the computing node has a time-sequence dependency.

It is understood that, the sequence of any one or more of steps S1 to S4 optimized for the first calculation flowchart may be adjusted or deleted.

As can be seen from the above, in the method for constructing a computation flowchart provided in the embodiment of the present application, the submodules are fully decoupled in the form of the computation flowchart, various complex business logics are described through the connection relationships of the graphs, and then the connection relationships and/or the computation nodes are optimized on the basis of the computation flowchart, so that when the computation flowchart is applied to perform computation processing on an object to be processed of a target computation task, accuracy of a computation flow corresponding to the business logics can be ensured, and computation efficiency is improved.

Referring to fig. 2, fig. 2 is a flowchart of a calculation efficiency optimization method according to some embodiments of the present disclosure. The method adopts the calculation flow chart construction method in the embodiment to construct the calculation flow chart. Specifically, the method comprises the following steps:

s201, acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods;

s202, monitoring the current computing load of each computing node in the computing flow chart in the process of computing the object to be processed of the target computing task by applying the computing flow chart;

s203, determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes, wherein the step of optimizing the computing efficiency of the bottleneck computing nodes comprises at least one of the following steps:

In step S202, in order to more accurately obtain the current computing load of each computing node and reasonably schedule the computing resources, this embodiment provides an implementation manner for monitoring the current computing load of each computing node in the computing flowchart, and the implementation manner may be specifically executed with reference to steps S2011 and S2012 as follows:

s2011: acquiring log information of thread records of each computing node in a computing flow chart; the log information comprises data information received or sent by each thread when the subtask corresponding to the computing node is executed by each thread and time information of each thread for executing the subtask. S2012: and determining the current computing load of each computing node according to the log information recorded by each thread.

The log information recorded by the threads of each computing node may be obtained according to a data transmission pipeline connected to each computing node, or may be obtained by recording time when each thread executes a subtask corresponding to the computing node, which is not limited herein. The data transmission pipeline comprises a data input pipeline and a data output pipeline, wherein the data input pipeline is a pipeline for receiving data to be processed by a current computing node; the data to be processed is output data of a previous-level computing node; the data output pipeline is a pipeline used for outputting target data by the current computing node; the target data is data obtained after the current computing node executes the subtasks. The log information records data information in a data input pipeline and a data output pipeline of the computing node when each thread executes a subtask corresponding to the computing node, and the data information may include parameters such as data type or data amount.

Determining node information of each computing node according to log information recorded by each thread; the node information comprises any one or more of data quantity corresponding to a data input pipeline and data quantity corresponding to a data output pipeline of each computing node and expected consumption time of a thread corresponding to each computing node for executing subtasks in each computing node; and determining the current computing load of the corresponding computing node according to the node information of each computing node. According to the data information received or sent by each thread when executing the subtask corresponding to the computing node in the log information recorded by each thread, the data amount corresponding to the data input pipeline and the data amount corresponding to the data output pipeline of the computing node when executing the subtask corresponding to the computing node by each thread can be determined, wherein the data amount corresponding to the data input pipeline of the computing node is the sum of the data amounts in all the data input pipelines of the computing node, and the data amount corresponding to the data output pipeline of the computing node is the data amount in each data output pipeline of the computing node. And determining the starting time and the predicted consumption time of the thread corresponding to each computing node for executing the subtasks in each computing node according to the time information of each thread for executing the subtasks in the log information recorded by each thread, wherein when the thread finishes the execution of the subtasks of the computing nodes, the starting time, the ending time and the total consumption time for executing the subtasks are recorded in the log information. And determining the current load state of the computing node according to one or more of the node information of the computing node, and determining whether the computing node is a bottleneck computing node according to one or more of the node information of the computing node. And determining whether the computing node is in a computing power bottleneck state according to the current load state of the computing node and computing resources corresponding to the computing node.

It is to be understood that a bottleneck compute node refers to a compute node that is the bottleneck of the target compute task, which may not necessarily reach a computation power bottleneck state. For example, although a certain computing node has residual computing power, the unreasonable parameter setting restricts the exertion of the residual computing power, and the computing node can also become a bottleneck computing node. Therefore, whether the computing node reaches the computing power bottleneck state or not can be judged firstly, and different optimization processing can be carried out on the computing power bottleneck state reaching and not reaching. Judging whether the computing node reaches the computing power bottleneck state can be carried out by the following steps: and judging whether the current computing load of each computing node reaches a preset computing power bottleneck condition or not according to the node information of each computing node. If so, the bottleneck computing node reaches a computing power bottleneck state.

Because the node information includes the data amount corresponding to the data input pipeline and the data amount corresponding to the data output pipeline of each computing node, the speed of each computing node performing data processing through a thread and the expected consumption time of the thread corresponding to each computing node executing the subtasks in each computing node are any one or more of, for different node information, the implementation manner of determining whether the computing node reaches the preset computing power bottleneck condition is different, and the following implementation manner one to four may be specifically referred to:

the first implementation mode comprises the following steps: and if the node information comprises the data volume corresponding to the data input pipeline of each computing node, judging whether the data volume corresponding to the data input pipeline of each computing node reaches the preset input data volume, and determining the computing node reaching the preset input data volume as the bottleneck computing node reaching the preset computing power bottleneck condition. The data volume corresponding to the input pipeline of each computing node is monitored, namely the total data volume of all the input pipelines corresponding to the computing nodes is monitored, so that the task volume to be processed of the computing node is judged according to the total data volume of the input pipelines. When the total data volume of the data input pipeline of the computing node reaches the preset input data volume, the computing node is determined to be a bottleneck computing node, namely the input pipeline of the bottleneck computing node is blocked, the speed of the bottleneck computing node for processing data through the thread is far less than the receiving speed of the data to be processed, and the bottleneck computing node is in a computing bottleneck state. Computing resources may be subsequently scheduled to the bottleneck computing node to expand the computing power of the target computing node. The preset input data amount may be a value that enables the target calculation task to complete the calculation processing as soon as possible, according to the actual processing condition of the target calculation task.

The second embodiment: and if the node information comprises the data volume corresponding to the data output pipeline of each computing node, judging whether the data volume corresponding to the data output pipeline of each computing node reaches the preset output data volume, and determining the downstream computing node of the computing node reaching the preset output data volume as a bottleneck computing node and the bottleneck computing node reaches a computing power bottleneck state. Wherein, the computing node transmits data to the downstream computing node through the data output pipeline. By monitoring the data volume corresponding to the output pipeline of each computing node, namely monitoring the data volume in each data output pipeline of the computing node, the data volume to be processed of the downstream computing node of each data output pipeline can be determined, the downstream computing node of the computing node reaching the preset output data volume is determined as the bottleneck computing node reaching the preset computing power bottleneck state, and at the moment, computing resources can be scheduled to the bottleneck computing node to expand the computing power of the target computing node. The preset data output quantity can be set manually, and the preset data output quantity value can be smaller than the preset input data quantity because the preset output data quantity is a data quantity threshold value of a single data output pipeline.

The third embodiment is as follows: and if the node information comprises the predicted consumption time of the threads corresponding to the computing nodes for executing the subtasks in the computing nodes, judging whether the predicted consumption time of each thread is more than the preset time when the threads execute the subtasks in the computing nodes, determining the computing nodes with the predicted consumption time more than the preset time as bottleneck computing nodes, and enabling the bottleneck computing nodes to reach the preset computing power bottleneck state. According to the log information recorded by each thread, the historical record of the computing time required by each computing node to execute the subtask can be determined, the expected consumption time of each computing node to execute the subtask can be determined according to the historical record of the computing time required by each computing node to execute the subtask (for example, the data processing speed of each computing node can be determined according to the historical record of the computing time required by each computing node to execute the subtask, the expected consumption time of each computing node to execute the subtask can be determined according to the data processing speed and the data volume to be processed of each node), when the expected consumption time is long, the complexity of the subtask corresponding to the computing node is high, when the expected consumption time of executing the subtask of the computing node reaches the preset time, the computing node is determined to be a bottleneck computing node, and the bottleneck computing node reaches the preset computing power bottleneck state. Or, the starting time and the current consumed time of each computing node for executing the subtask through the thread may be monitored, and the complexity of the subtask corresponding to each computing node is determined, where the longer the current consumed time is, the higher the complexity of the subtask corresponding to the computing node is, and when the current consumed time of the computing node for executing the subtask corresponding to the computing node through the thread reaches the preset time, the computing node is determined to be the bottleneck computing node and the bottleneck computing node reaches the preset computing bottleneck state. The preset time can be set manually according to experiments in which the computing nodes execute the subtasks for multiple times.

The fourth embodiment: and if the node information comprises the data quantity corresponding to the data input pipeline and the data quantity corresponding to the data output pipeline of each computing node, judging whether the ratio of the data quantity corresponding to the data input pipeline and the data quantity corresponding to the data output pipeline of each computing node reaches a preset ratio or not, and determining the computing node reaching the preset ratio as a bottleneck computing node and determining the bottleneck computing node to reach a computing power bottleneck state. The data processing speed of the computing node can be indirectly reflected by the ratio of the data quantity corresponding to the data input pipeline of the computing node to the data quantity corresponding to the data output pipeline of the computing node, and when the ratio of the data quantity corresponding to the data input pipeline of the computing node to the data quantity corresponding to the data output pipeline of the computing node is large, the data processing speed of the computing node is low, the data to be processed of the computing node is large, the possibility of node blockage is high, and therefore the computing node reaching the preset ratio is determined as a bottleneck computing node and the bottleneck computing node reaches a preset computing power bottleneck state.

In a specific embodiment, a computing node where the data volume corresponding to the data input pipeline reaches the preset input data volume and the expected consumption time for executing the subtasks reaches the preset time may be determined as the bottleneck computing node. And determining a downstream computing node connected with a pipeline with a data volume reaching a preset output data volume corresponding to the data output pipeline (and the predicted consumption time of the downstream computing node for executing the subtasks reaches a preset time) as a bottleneck computing node.

In a specific embodiment, the current computing load of each computing node may be dynamically displayed in the application computing flowchart by using a thermodynamic diagram, for example, a computing node with a high current computing load and a computing power reaching a bottleneck state is displayed as a warm color, and a computing node with a low current computing load and a computing power being abundant is displayed as a cold color.

When the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; or adjusting the maximum batch processing amount of the bottleneck computing node.

Optionally, after the node splitting processing and/or the maximum batch processing amount is adjusted, the calculation flowchart may be reconstructed according to the adjusted maximum batch processing amount and/or the bottleneck calculation node after the splitting processing, and the reconstructed calculation flowchart may be obtained; re-executing the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the reconstructed computing flow chart; determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes. Optionally, this step is repeated until there are no optimizable bottleneck compute nodes.

When the current computing load of the bottleneck computing node reaches a computing power bottleneck state, if residual computing resources exist, the computing resources can be dispatched to the target computing node. The embodiment of scheduling computing resources to a target computing node may be specifically executed with reference to the following steps a to b:

step b: and dispatching the computing resources to the target computing nodes which reach the preset computing power bottleneck condition.

In order to improve the computing efficiency under the condition of limited computing resources of scheduling equipment, an idle thread is obtained from a process for executing a computing flow chart; and scheduling the idle thread to the target computing node so that the target computing node processes the subtask corresponding to the target computing node in parallel by executing the current thread and the idle thread of the target computing node. The process of the computing flow chart for executing the target computing task may include multiple threads, the scheduling device allocates a thread for executing a sub-task to each computing node in the computing flow chart by scheduling the computing resource, and when the target computing node reaches a preset computing power bottleneck state, the idle thread in an idle state is scheduled to the target computing node from the process of the computing flow chart for executing the target computing task, so that the target computing node processes the sub-task in parallel through the multiple threads, thereby generating output data as soon as possible, and realizing independent management control of the computing node and the computing resource. The idle threads may be threads that have completed the execution of the subtasks of the computing nodes, or may also be threads that have not started the execution of the subtasks at a subsequent node assigned to the target computing node, where the number of the idle threads may be one or more, and the scheduling device schedules one or more corresponding idle threads to the target computing node according to the number and state of the current idle threads, so as to reduce the thread switching overhead in the operating system as much as possible.

In practical applications, the scheduling device may be a device having a scheduling function, and the scheduling device calls a special thread to execute a calculation efficiency optimization task through a CPU, and schedules an idle thread to each target calculation node. The scheduling device may also be a device without scheduling function, such as NPU, cambrian MLU single card, etc., and the scheduling device may schedule its computing resources through the CPU.

According to the method for optimizing the computational efficiency, the computing nodes and the data transmission pipelines are abstracted from the target computing task, so that the data flow time sequence dependency relationship of the high-complexity computing task is simplified; the bottleneck computing nodes are found in time by monitoring the current computing load of each computing node in the computing flow chart; and the calculation efficiency of the bottleneck calculation node is optimized pertinently according to the reason that the bottleneck calculation node generates the bottleneck state, so that the calculation efficiency of the calculation equipment is greatly improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a computing flowchart constructing apparatus according to some embodiments of the present disclosure. The calculation flow chart construction device comprises: a first obtaining module 301, a setting module 302 and an optimizing module 303.

The first obtaining module 301 is configured to obtain each subtask of the target computing task and allocate a computing node to each subtask. The target computing task is divided into a plurality of subtasks, and a computing node is allocated to each subtask, wherein each subtask corresponds to one computing node. For example, in the target tracking calculation task, the target tracking calculation task may be divided into a plurality of subtasks, such as target detection and path generation, and one calculation node for implementing target detection and one calculation node for path generation may be correspondingly allocated.

The setting module 302 is configured to set a data pipeline between each computing node according to a dependency relationship between each subtask, so as to obtain a first computing flowchart. The data pipeline is used for transmitting the target data output by the output end of the upper-level computing node to the input end of the lower-level computing node. The data pipeline is used for connecting the computing nodes with data transmission relationship, and can be divided into a broadcast pipeline, a multi-branch pipeline, a merging pipeline, an order-preserving pipeline and the like according to the connection relationship among different computing nodes. Setting a data pipeline according to the data flow direction relation among the computing nodes, wherein the type of the data pipeline can be set in at least one of the following modes:

The second method comprises the following steps: when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a splitting condition, and the target data is data generated after the second computing node executes the subtasks. The multi-branch pipeline selects one of the multiple data output pipelines for data output according to the satisfied conditions, for example, the computing node 1 is connected with the computing node 2 and the computing node 3 of the next level through the multi-branch pipeline (data pipeline a and data pipeline b), when the data generated by the computing node 1 executing the subtask satisfies the condition a, the computing node 1 transmits the generated data to the computing node 2 through the data pipeline a; when the data generated by the computing node 1 executing the subtask satisfies the condition B, the computing node 1 transmits the generated data to the computing node 3 through the data pipe B.

The third method comprises the following steps: when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; and the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks. The merging pipeline is a plurality of data pipelines used when data generated by a plurality of computing nodes executing subtasks are output to the same next-level node.

The method is as follows: when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data. The order-preserving pipeline refers to a data pipeline for adjusting the order of output data according to the order of input data of the computing node, for example, the data pipeline connected to the computing node 4 is an order-preserving pipeline, the computing node 4 is a node for executing an image detection task, the order of frame numbers of images received by the computing node when executing the image detection task is 12345, and if the order of frame numbers corresponding to image detection results output after the computing node 4 executes the image detection task is 21453, the order-preserving pipeline connected to the computing node 4 can correct the order of frame numbers corresponding to the image detection results to 12345.

The optimization module 303 is configured to perform optimization processing on the connection relationship and/or the computing node of the first computing flowchart to obtain a second computing flowchart after the optimization processing. Only the optimization of the connection relationship or the optimization of the computing node may be performed, and of course, the optimization effects on the connection relationship and the computing node are optimal at the same time.

Wherein, optimizing the connection relationship may include the following: and optimizing the wrong connection relation existing in the first calculation flow chart.

The optimization processing on the computing node of the first computing flowchart may include any one or more of the following manners: and adjusting the serial computing nodes without data dependency relationship in the first computing flow chart into asynchronous parallel computing nodes. And setting parameters of each computing node and data pipeline in the first computing flow chart. And splitting the computing nodes meeting the splitting condition in each computing node in the first computing flow chart.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a computational efficiency optimization apparatus according to some embodiments of the present disclosure. The calculation efficiency optimization device includes: a second acquisition module 401, a monitoring module 402, and a scheduling module 403.

The second obtaining module 401 is configured to obtain a calculation flowchart, where the calculation flowchart is constructed by using any one of the calculation flowchart construction methods described above.

The monitoring module 402 is configured to monitor a current computation load of each computation node in the computation flowchart during a process of applying the computation flowchart to perform computation processing on an object to be processed of a target computation task.

The optimization module 403 is configured to determine a bottleneck computing node from each computing node, and optimize the computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps: when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; when the current computing load of the bottleneck computing node does not reach the computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node; and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.

The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the foregoing embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

As shown in fig. 5, an electronic device according to an embodiment of the present application is further provided, and includes a processor 501 and a memory 502, where the memory 502 stores computer readable instructions, and when the computer readable instructions are executed by the processor 501, the method according to any one of the above embodiments is performed.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for constructing a computing flow chart is characterized by comprising the following steps:

optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization;

the setting of the data pipeline between each computing node according to the dependency relationship between each subtask includes:

setting a type of data pipe by at least one of:

2. The method for building the computation flowchart according to claim 1, wherein performing optimization processing on the connection relationship of the first computation flowchart includes:

3. The method according to claim 2, wherein the optimizing the misconnection existing in the first computational flowchart comprises:

4. The method according to any one of claims 1 to 3, wherein performing optimization processing on the computing nodes of the first computing flowchart comprises:

adjusting serial computing nodes without data dependency relationship in the first computing flow chart into asynchronous parallel computing nodes;

5. The method according to claim 4, wherein the setting parameters of each computing node and data pipe in the first computing flowchart includes:

6. The method according to claim 5, wherein the setting parameters of each computing node and data pipe in the first computing flowchart includes:

7. The method according to claim 4, wherein the splitting processing of the computing node satisfying the splitting condition in each computing node in the first computing flowchart includes:

splitting the target compute node into a plurality of new compute nodes.

8. The method according to claim 7, wherein the splitting processing is performed on the computing node satisfying the splitting condition in each computing node in the first computing flowchart, further comprising:

if not, the splitting process is ended.

9. A method for optimizing computational efficiency, comprising the steps of:

acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting the calculation flow chart construction method of any one of claims 1-8;

10. The computational efficiency optimization method of claim 9, further comprising:

monitoring the current computing load of each computing node in the computing flow chart in the process of computing the object to be processed of the target computing task by applying the computing flow chart; determining a bottleneck computing node from each computing node, and optimizing the computing efficiency of the bottleneck computing node.

11. A computing flow chart constructing apparatus, comprising:

the optimization module is used for optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization;

the setting module is specifically configured to:

setting the type of data pipe by at least one of:

12. A computational efficiency optimization apparatus, comprising:

a second obtaining module, configured to obtain a pre-constructed computational flowchart, where the computational flowchart is constructed by using the computational flowchart construction method according to any one of claims 1 to 8;

an optimization module, configured to determine a bottleneck computing node from each computing node, and optimize a computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps:

when the current computing load of a target bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;

and when the current computing load of the bottleneck computing node reaches a preset computing power bottleneck state, scheduling computing resources to the target computing node.

13. An electronic device comprising a processor and a memory storing computer readable instructions that, when executed by the processor, perform the method of any one of claims 1-8.