CN113238837A - Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment - Google Patents

Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment Download PDF

Info

Publication number
CN113238837A
CN113238837A CN202110433418.1A CN202110433418A CN113238837A CN 113238837 A CN113238837 A CN 113238837A CN 202110433418 A CN202110433418 A CN 202110433418A CN 113238837 A CN113238837 A CN 113238837A
Authority
CN
China
Prior art keywords
computing
node
computing node
data
flow chart
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110433418.1A
Other languages
Chinese (zh)
Other versions
CN113238837B (en
Inventor
高鹏远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuangshi Technology Co Ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Publication of CN113238837A publication Critical patent/CN113238837A/en
Application granted granted Critical
Publication of CN113238837B publication Critical patent/CN113238837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/323Visualisation of programs or trace data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a method and a device for building a calculation flow chart and optimizing calculation efficiency and electronic equipment. The method for constructing the calculation flow chart comprises the following steps: acquiring each subtask of a target computing task, and distributing computing nodes for each subtask; setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart; and optimizing the first calculation flow chart to obtain an optimized second calculation flow chart. The method and the device can improve the calculation efficiency of the calculation flow chart.

Description

Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment
Technical Field
The application relates to the technical field of computers, in particular to a method and a device for building a calculation flow chart and optimizing calculation efficiency and electronic equipment.
Background
As machine learning techniques have matured, they have been widely used in various fields. Video or image processing using machine learning techniques is usually performed under high concurrency, high video flow number, and multiple service flows, the calculation amount is large, the service logic is complex, and performance optimization is required to improve the calculation efficiency under the condition of certain hardware calculation power. However, there is no scientific performance analysis tool, and performance analysis cannot be performed to find out the performance bottleneck and optimize the performance bottleneck.
In view of the above problems, no effective technical solution exists at present.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for constructing a calculation flow chart and optimizing calculation efficiency and electronic equipment, and the calculation efficiency can be improved.
In a first aspect, an embodiment of the present application provides a method for building a computation flow chart, including:
acquiring each subtask of a target computing task and distributing computing nodes for each subtask;
setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart;
and optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.
Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, performing optimization processing on the connection relationship of the first computing flowchart includes:
and optimizing the wrong connection relation existing in the first calculation flow chart.
Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, the optimizing an incorrect connection relation existing in the first computing flowchart includes:
detecting whether the input end and the output end of each computing node of the first computing flow chart are connected with a data pipeline or not, and optimizing the first computing flow chart according to the detection result;
and/or detecting whether the input end and the output end of the data pipeline of the first calculation flow chart are both connected with other data pipelines or calculation nodes, and optimizing the first calculation flow chart according to the detection result;
and/or detecting whether a data pipeline without a set rule exists in the first calculation flow chart to obtain a third detection result, and optimizing the first calculation flow chart according to the detection result.
Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, performing optimization processing on a computation node of the first computation flowchart includes:
adjusting serial computing nodes without data dependency in the first computing flow chart into asynchronous parallel computing nodes;
and/or setting parameters of each computing node and data pipeline in the first computing flowchart;
and/or splitting the computing nodes meeting the splitting condition in each computing node in the first computing flowchart.
Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, the setting parameters of each computation node and data pipeline in the first computation flowchart includes:
and setting parameters for setting the maximum length of the buffer queue for the data pipeline.
Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, the setting parameters of each computation node and data pipeline in the first computation flowchart includes:
and setting parameters of the maximum batch processing amount of each computing node and a corresponding time-out mechanism.
Optionally, in the method for constructing a computation flowchart according to the embodiment of the present application, the splitting processing performed on a computation node that satisfies a splitting condition in each computation node in the first computation flowchart includes:
screening out target computing nodes from the computing nodes of the first computing flow chart; the subtask corresponding to the target computing node can be split into a plurality of subtasks;
splitting the target compute node into a plurality of new compute nodes;
and taking the input end of the target computing node as the input end of the plurality of new computing nodes, and taking the output end of the target computing node as the output end of the plurality of new computing nodes.
Optionally, splitting a computing node that satisfies a splitting condition in each computing node in the first computing flowchart, further including:
judging whether a target computing node exists in the plurality of new computing nodes;
if yes, screening out a target computing node from the plurality of new computing nodes, and returning to execute the step of splitting the target computing node into the plurality of new computing nodes;
if not, the splitting process is ended.
Optionally, in the method for constructing a computing flowchart according to the embodiment of the present application, setting a data pipeline between each computing node according to a dependency relationship between each subtask includes:
setting a data pipeline according to the data flow direction relation among the computing nodes;
setting a type of data pipe by at least one of:
when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; the first computing node is a node which transmits the data after the subtask is executed to a plurality of next-level computing nodes at the same time;
when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a selection condition, and the target data is data generated after the second computing node executes the subtasks;
when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks;
when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data.
In a second aspect, an embodiment of the present application further provides a calculation efficiency optimization method, including the following steps:
acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods;
monitoring the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the computing flow chart;
determining a bottleneck computing node from each computing node, and optimizing the computing efficiency of the bottleneck computing node, wherein the step of optimizing the computing efficiency of the bottleneck computing node comprises at least one of the following steps:
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node;
and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.
Optionally, in the method for optimizing computational efficiency according to the embodiment of the present application, the method further includes:
re-acquiring a calculation flow chart according to the adjusted maximum batch processing amount and/or the split bottleneck calculation node;
monitoring the current computing load of each computing node in the computing flow chart in the process of computing the object to be processed of the target computing task by applying the computing flow chart; determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes until no optimizable bottleneck computing nodes exist.
In a third aspect, an embodiment of the present application further provides a device for constructing a computation flowchart, including:
the first acquisition module is used for acquiring each subtask of the target computing task and distributing computing nodes for each subtask;
the setting module is used for setting data pipelines among all the computing nodes according to the dependency relationship among all the subtasks to obtain a first computing flow chart;
and the optimization module is used for optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.
In a fourth aspect, an embodiment of the present application further provides a device for optimizing computational efficiency, including:
and the second acquisition module is used for acquiring a calculation flow chart, and the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods.
The monitoring module is used for monitoring the current computing load of each computing node in the computing flow chart in the process of applying the computing flow chart to compute and process the object to be processed of the target computing task;
an optimization module, configured to determine a bottleneck computing node from each computing node, and optimize the computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps: when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node; and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.
In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the electronic device executes any one of the methods described above.
As can be seen from the above, the method, the device and the electronic device for constructing the computing flowchart and optimizing the computing efficiency provided in the embodiment of the present application acquire each subtask of the target computing task and allocate a computing node to each subtask; setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart; optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization; so that the calculation efficiency can be improved.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a method for constructing a computing flowchart according to an embodiment of the present application.
Fig. 2 is a flowchart of a calculation efficiency optimization method according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a calculation flowchart construction apparatus in an embodiment of the present application.
Fig. 4 is a schematic diagram of a calculation efficiency optimization device in the embodiment of the present application.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Fig. 1 is a flowchart of a method for constructing a computing flowchart according to an embodiment of the present application. The method for constructing the calculation flow chart comprises the following steps:
s101, acquiring each subtask of a target computing task and distributing computing nodes for each subtask;
s102, setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart;
s103, optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.
In order to understand the performance of the computing device, the embodiment of the invention provides a performance analysis tool to monitor the running condition of the SDK, so that when a problem occurs, the cause of the specific problem can be understood at the minimum cost. The embodiment of the invention provides a method for constructing a computation flow chart, wherein the computation flow chart is a Directed Acyclic Graph (DAG), and the method can be used for carrying out visual analysis on the time sequence change of a data backlog state based on the computation flow chart.
In step S101, the target computing task is divided into a plurality of subtasks, and a computing node is allocated to each subtask, where each subtask corresponds to one computing node. For example, in the target tracking calculation task, the target tracking calculation task may be divided into a plurality of subtasks, such as target detection and trajectory generation, and one calculation node for implementing target detection and one calculation node for trajectory generation may be correspondingly allocated.
In step S102, the data pipeline is used to transmit the target data output from the output terminal of the upper-stage computing node to the input terminal of the lower-stage computing node. The data pipeline is used for connecting the computing nodes with data transmission relationship, and can be divided into a common pipeline and a special pipeline according to the connection relationship among different computing nodes, wherein the special pipeline comprises a broadcast pipeline, a multi-branch pipeline, a merging pipeline, an order-preserving pipeline and the like. Setting a data pipeline according to the data flow direction relation among the computing nodes, wherein the type of the data pipeline can be set in at least one of the following modes:
the first method is as follows: when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; and the first computing node is a node which transmits the data after the subtasks are executed to a plurality of next-level computing nodes at the same time.
The second method comprises the following steps: when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a selection condition, and the target data is data generated after the second computing node executes the subtasks. The multi-branch pipeline selects one of a plurality of data output pipelines for data output according to the satisfied conditions, for example, the computing node 1 is connected with the computing node 2 and the computing node 3 of the next level through the multi-branch pipeline (a data pipeline a and a data pipeline b), and when the data generated by the computing node 1 executing the subtasks satisfies the condition a, the computing node 1 transmits the generated data to the computing node 2 through the data pipeline a; when the data generated by the computing node 1 executing the subtask satisfies the condition B, the computing node 1 transmits the generated data to the computing node 3 through the data pipe B.
The third method comprises the following steps: when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; and the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks. The merging pipeline is a plurality of data pipelines used when data generated by a plurality of computing nodes executing subtasks are output to the same next-level node.
The method is as follows: when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data. The order-preserving pipeline refers to a data pipeline that adjusts an output data sequence according to an input data sequence of a computing node, for example, an output end data pipeline connected to the computing node 4 is an order-preserving pipeline, the computing node 4 is a node that performs an image detection task, an image frame number sequence received by the computing node when the image detection task is performed is 12345, and if a frame number sequence corresponding to an image detection result output after the image detection task is performed by the computing node 4 is 21453, the order-preserving pipeline connected to the computing node 4 may correct the frame number sequence corresponding to the image detection result to 12345.
However, in step S103, only the connection relationship may be optimized, or only the calculation node may be optimized, and naturally, the optimization effect is optimal for both the connection relationship and the calculation node.
Wherein, optimizing the connection relationship may include the following:
and S1, optimizing the wrong connection relation existing in the first calculation flow chart.
In step S1, the method may specifically include the following sub-steps: s11, detecting whether the input end and the output end of each computing node of the first computing flow chart are both connected with a data pipeline, and optimizing the first computing flow chart according to the detection result; s12, detecting whether the input end and the output end of the data pipeline of the first calculation flow chart are both connected with other data pipelines or calculation nodes, and optimizing the first calculation flow chart according to the detection result; s13, detecting whether a data pipeline without a set rule exists in the first calculation flow chart to obtain a third detection result, and optimizing the first calculation flow chart according to the detection result; s14, detecting whether a situation that a data type corresponding to a data pipe connected to an input end of a computation node is inconsistent with a type of input data corresponding to the computation node exists in the first computation flowchart, detecting whether a situation that a data type corresponding to a data pipe connected to an output end of a computation node is inconsistent with a type of output data corresponding to the computation node exists in the first computation flowchart, and if so, adjusting and optimizing one of the data type of the data pipe, the type of input data of the computation node, and the type of output data of the computation node.
In operation S11, it is mainly detected whether there is a floating input or output of each compute node in the first computation flowchart (the data pipe is defined and registered but not used), and if there is a floating input or output, it indicates that the connection relationship is wrong, and optimization is required. And during optimization, judging that the node with the suspended input end or output end belongs to redundant computing nodes, if not, adding corresponding data pipelines according to the dependency relationship between the computing node and other computing nodes, and connecting the computing node with the suspended input end or output end with other corresponding computing nodes. In operation S12, it is mainly detected that the data pipe is floating, and if one end of a certain data pipe is not connected to a data node, it indicates that the data pipe may be a redundant data pipe or a data pipe with a connection error. According to whether the computing node connected with one end of the data pipeline is connected with all the computing nodes with dependency relationship, if so, the data pipeline is a redundant data pipeline and is directly deleted, and if not, the suspended end of the data pipeline is connected with another computing node which has dependency relationship with the current computing node but does not establish connection. When the registered computing node or data pipe is a redundant computing node or data pipe, the computing node or data pipe may be deleted and the computing resources corresponding to the computing node or data pipe may be released. In operation S13, it is mainly detected whether a rule definition is set for a particular data pipe, for example, whether the demux data pipe is configured to transmit which data (e.g., odd-numbered data) to one compute node for attribute identification and which data (e.g., even-numbered data) to another compute node for attribute identification. And if the special data pipeline without the set rule definition exists, setting the rule definition for the special data pipeline.
Steps S11-S14 are actually security checks of the calculation flowchart, and the calculation flowchart is checked first, and then the problems found by the check are optimized. In this manner, the probability of problems in the development of new target computing tasks may be significantly reduced. The optimization processing of the computing nodes of the first computing flowchart may include any one or more of the following manners:
and S2, adjusting the serial computing nodes without data dependency in the first computing flowchart into asynchronous parallel computing nodes. And S3, setting parameters of each calculation node and data pipeline in the first calculation flow chart. And S4, splitting the calculation nodes meeting the splitting condition in the calculation nodes in the first calculation flow chart.
In operation S2, the serial computing nodes in the first computing flowchart that do not have data dependency are adjusted to be asynchronous parallel computing nodes, so that the computing delay can be reduced and the computing efficiency can be improved.
Wherein, for this operation S3, parameter setting is performed for each computation node and data pipe in the first computation flowchart. At present, the mainstream computing hardware has certain batch processing optimization on the NN network reasoning side, and the computing efficiency can be improved by carrying out batch processing. Therefore, a caching mechanism for different computing data of the same computing task can be arranged in the target computing task, so that the cached data can be processed in batch. The data pipeline in the calculation flow chart corresponds to a certain storage resource, the capacity of caching data of the same task is naturally achieved, and when the calculation flow chart is constructed, parameters such as the maximum data size of the cache of the data pipeline, the maximum batch processing size allowed by the calculation node, data analysis delay and the like can be set, so that the maximum batch processing size can be determined subsequently. This step S3 may include the following sub-steps: s31, setting parameters for the maximum length of the data pipeline setting buffer queue; and S32, setting parameters for the maximum batch processing amount of each computing node and the corresponding time-out mechanism. However, for step S3, there is only one of S31 and S32, and of course, it is more effective if both settings are used. In this way, the maximum length of the buffer queue can be set for the data pipeline to control the storage resource consumption and ensure the pressure feedback capacity for the upper-level computing node. The high-efficiency batch processing under the premise of ensuring the maximum data analysis delay is realized by setting the maximum batch processing amount of each computing node and a corresponding timeout mechanism, so that the data throughput of the computing nodes can be greatly improved.
Wherein, for the operation S4, the splitting condition is that the sub-task corresponding to the computing node can be split into a plurality of parallel sub-tasks. And splitting each splittable computing node to obtain a new computing node, and continuously splitting the splittable computing nodes in the new computing nodes until the splittable computing nodes cannot be split.
In some embodiments, this step S4 may include the following sub-steps: s41, screening out target computing nodes from the computing nodes of the first computing flow chart; the subtask corresponding to the target computing node can be split into a plurality of parallel subtasks; and S42, splitting the target computing node into a plurality of new computing nodes.
In some embodiments, after step S42, step S4 may further include the sub-steps of: s43, judging whether a target computing node exists in the plurality of new computing nodes; s44, if yes, screening out a target computing node from the plurality of new computing nodes, and returning to execute the step of splitting the target computing node into the plurality of new computing nodes; s45, if not, the splitting process is ended.
In step S41, all target compute nodes satisfying the splitting condition are screened, and in step S42, the split compute nodes are connected to the previous node using a broadcast pipe and to the next node using a merge pipe. In step S43, the determination criterion is as shown in step S41, and if the subtask corresponding to the first computing node can be further divided into a plurality of subtasks, it indicates that the first computing node is the target computing node, and further division can be performed. In step S44, the plurality of split computing nodes may be connected to the previous node using a broadcast pipe and connected to the next node using a merge pipe.
Of course, it can be understood that when the new computing nodes formed by splitting are connected, the computing nodes without dependency relationship can also be processed in parallel, thereby improving the processing efficiency. Of course, it is understood that in some embodiments, for a stateful computing node, it is necessary to ensure that a data pipeline connected to an input end is a multi-branch type pipeline, so as to ensure the correctness of the behavior of the stateful node when processing multiple video stream timing dependent data, and for the stateful node, no splitting process is required. A stateful node means that data input to the node is required, for example, the data input to the computing node has a time-sequence dependency.
It is understood that the sequence of any one or more of the steps S1-S4 optimized for the first computing flowchart may be adjusted or deleted.
As can be seen from the above, in the method for constructing a computation flowchart provided in the embodiment of the present application, the submodules are fully decoupled in the form of the computation flowchart, various complex business logics are described through the connection relationships of the graphs, and then the connection relationships and/or the computation nodes are optimized on the basis of the computation flowchart, so that when the computation flowchart is applied to perform computation processing on an object to be processed of a target computation task, accuracy of a computation flow corresponding to the business logics can be ensured, and computation efficiency is improved.
Referring to fig. 2, fig. 2 is a flowchart of a calculation efficiency optimization method according to some embodiments of the present disclosure. The method adopts the calculation flow chart construction method in the embodiment to construct the calculation flow chart. Specifically, the method comprises the following steps:
s201, acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting any one of the calculation flow chart construction methods;
s202, monitoring the current computing load of each computing node in the computing flow chart in the process of computing the object to be processed of the target computing task by applying the computing flow chart;
s203, determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes, wherein the step of optimizing the computing efficiency of the bottleneck computing nodes comprises at least one of the following steps:
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node;
and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.
In step S202, in order to more accurately obtain the current computing load of each computing node and reasonably schedule the computing resources, this embodiment provides an implementation manner for monitoring the current computing load of each computing node in the computing flowchart, and the implementation manner may be specifically executed with reference to steps S2011 and S2012 as follows:
s2011: acquiring log information of thread records of each computing node in a computing flow chart; the log information comprises data information received or sent by each thread when the subtask corresponding to the computing node is executed by each thread and time information of each thread for executing the subtask. S2012: and determining the current computing load of each computing node according to the log information recorded by each thread.
The log information recorded by the thread of each computing node may be obtained according to a data transmission pipeline connected to each computing node, or may be obtained by recording time when each thread executes a subtask corresponding to the computing node, which is not limited herein. The data transmission pipeline comprises a data input pipeline and a data output pipeline, wherein the data input pipeline is a pipeline for receiving data to be processed by a current computing node; the data to be processed is output data of a previous-level computing node; the data output pipeline is a pipeline used for outputting target data by the current computing node; the target data is data obtained after the current computing node executes the subtasks. The log information records data information in a data input pipeline and a data output pipeline of the computing node when each thread executes a subtask corresponding to the computing node, and the data information may include parameters such as data type or data amount.
Determining node information of each computing node according to log information recorded by each thread; the node information comprises any one or more of data quantity corresponding to a data input pipeline and data quantity corresponding to a data output pipeline of each computing node and expected consumption time of a thread corresponding to each computing node for executing subtasks in each computing node; and determining the current computing load of the corresponding computing node according to the node information of each computing node. According to the data information received or sent by each thread when executing the subtask corresponding to the computing node in the log information recorded by each thread, the data amount corresponding to the data input pipeline and the data amount corresponding to the data output pipeline of the computing node when executing the subtask corresponding to the computing node by each thread can be determined, wherein the data amount corresponding to the data input pipeline of the computing node is the sum of the data amounts in all the data input pipelines of the computing node, and the data amount corresponding to the data output pipeline of the computing node is the data amount in each data output pipeline of the computing node. And determining the starting time and the predicted consumption time of the thread corresponding to each computing node for executing the subtasks in each computing node according to the time information of each thread for executing the subtasks in the log information recorded by each thread, wherein when the thread finishes the execution of the subtasks of the computing nodes, the starting time, the ending time and the total consumption time for executing the subtasks are recorded in the log information. And determining the current load state of the computing node according to one or more of the node information of the computing node, and determining whether the computing node is a bottleneck computing node according to one or more of the node information of the computing node. And determining whether the computing node is in a computing power bottleneck state according to the current load state of the computing node and computing resources corresponding to the computing node.
It is understood that a bottleneck computing node refers to a computing node that is the bottleneck of the target computing task and that does not necessarily reach a computing power bottleneck state. For example, although a certain computing node has residual computing power, the parameter setting is not reasonable, the exertion of the residual computing power is restricted, and the computing node can also become a bottleneck computing node. Therefore, whether the computing node reaches the computing power bottleneck state or not can be judged firstly, and different optimization processing can be carried out on the computing power bottleneck state reaching and not reaching. Judging whether the computing node reaches the computing power bottleneck state can be carried out by the following steps: and judging whether the current computing load of each computing node reaches a preset computing power bottleneck condition or not according to the node information of each computing node. If so, the bottleneck computing node reaches a computing power bottleneck state.
Because the node information includes the data amount corresponding to the data input pipeline and the data amount corresponding to the data output pipeline of each computing node, the speed of each computing node performing data processing through a thread and the expected consumption time of the thread corresponding to each computing node executing the subtasks in each computing node are any one or more of, for different node information, the implementation manner of determining whether the computing node reaches the preset computing power bottleneck condition is different, and the following implementation manner one to four may be specifically referred to:
the first implementation mode comprises the following steps: and if the node information comprises the data volume corresponding to the data input pipeline of each computing node, judging whether the data volume corresponding to the data input pipeline of each computing node reaches the preset input data volume, and determining the computing node reaching the preset input data volume as the bottleneck computing node reaching the preset computing power bottleneck condition. The data volume corresponding to the input pipeline of each computing node is monitored, namely the total data volume of all the input pipelines corresponding to the computing nodes is monitored, so that the task volume to be processed of the computing node is judged according to the total data volume of the input pipelines. When the total data volume of the data input pipeline of the computing node reaches the preset input data volume, the computing node is determined to be a bottleneck computing node, namely the input pipeline of the bottleneck computing node is blocked, the speed of the bottleneck computing node for processing data through the thread is far less than the receiving speed of the data to be processed, and the bottleneck computing node is in a computing bottleneck state. Computing resources may be subsequently scheduled to the bottleneck computing node to expand the computing power of the target computing node. The preset input data amount may be a value that enables the target calculation task to complete the calculation process as soon as possible, which is set manually according to the actual processing condition of the target calculation task.
The second embodiment: and if the node information comprises the data volume corresponding to the data output pipeline of each computing node, judging whether the data volume corresponding to the data output pipeline of each computing node reaches the preset output data volume, and determining the downstream computing node of the computing node reaching the preset output data volume as a bottleneck computing node and the bottleneck computing node reaches a computing power bottleneck state. Wherein, the computing node transmits data to the downstream computing node through the data output pipeline. By monitoring the data volume corresponding to the output pipeline of each computing node, namely monitoring the data volume in each data output pipeline of the computing node, the data volume to be processed of the downstream computing node of each data output pipeline can be determined, the downstream computing node of the computing node reaching the preset output data volume is determined as the bottleneck computing node reaching the preset computing power bottleneck state, and at the moment, computing resources can be scheduled to the bottleneck computing node to expand the computing power of the target computing node. The preset data output quantity can be set manually, and the preset data output quantity value can be smaller than the preset input data quantity because the preset output data quantity is a data quantity threshold value of a single data output pipeline.
The third embodiment is as follows: and if the node information comprises the predicted consumption time of the threads corresponding to the computing nodes for executing the subtasks in the computing nodes, judging whether the predicted consumption time of each thread is more than the preset time when the threads execute the subtasks in the computing nodes, determining the computing nodes with the predicted consumption time more than the preset time as bottleneck computing nodes, and enabling the bottleneck computing nodes to reach the preset computing power bottleneck state. According to the log information recorded by each thread, the history record of the required computing time when each computing node executes the subtask can be determined, the expected consumption time of each computing node when executing the subtask can be determined according to the historical record of the computing time required by each computing node when executing the subtask (for example, the data processing speed of each computing node can be determined according to the historical record of the computing time required by each computing node when executing the subtask, the expected consumption time of each computing node when executing the subtask can be determined according to the data processing speed of each node and the data amount to be processed), when the expected consumption time is longer, the subtask complexity corresponding to the computing node is higher, and when the predicted consumption time for executing the subtasks of the computing node reaches the preset time, determining that the computing node is a bottleneck computing node and the bottleneck computing node reaches a preset computing power bottleneck state. Or, the starting time and the current consumed time of each computing node for executing the subtask through the thread may be monitored, and the complexity of the subtask corresponding to each computing node is determined, where the longer the current consumed time is, the higher the complexity of the subtask corresponding to the computing node is, and when the current consumed time of the computing node for executing the subtask corresponding to the computing node through the thread reaches the preset time, the computing node is determined to be the bottleneck computing node and the bottleneck computing node reaches the preset computing bottleneck state. The preset time can be set manually according to experiments in which the computing nodes execute the subtasks for multiple times.
The fourth embodiment: and if the node information comprises the data quantity corresponding to the data input pipeline and the data quantity corresponding to the data output pipeline of each computing node, judging whether the ratio of the data quantity corresponding to the data input pipeline and the data quantity corresponding to the data output pipeline of each computing node reaches a preset ratio or not, and determining the computing node reaching the preset ratio as a bottleneck computing node and determining the bottleneck computing node to reach a computing power bottleneck state. The data processing speed of the computing node can be indirectly reflected by the ratio of the data quantity corresponding to the data input pipeline of the computing node to the data quantity corresponding to the data output pipeline of the computing node, and when the ratio of the data quantity corresponding to the data input pipeline of the computing node to the data quantity corresponding to the data output pipeline of the computing node is large, the data processing speed of the computing node is low, the data to be processed of the computing node is large, the possibility of node blockage is high, and therefore the computing node reaching the preset ratio is determined as a bottleneck computing node and the bottleneck computing node reaches a preset computing power bottleneck state.
In a specific embodiment, a computing node where the data volume corresponding to the data input pipeline reaches the preset input data volume and the expected consumption time for executing the subtasks reaches the preset time may be determined as the bottleneck computing node. And determining a downstream computing node connected with a pipeline with a data volume reaching a preset output data volume corresponding to the data output pipeline (and the predicted consumption time of the downstream computing node for executing the subtasks reaches a preset time) as a bottleneck computing node.
In a specific embodiment, the current computing load of each computing node may be dynamically displayed in the application computing flowchart by using a thermodynamic diagram, for example, a computing node with a high current computing load and a computing power reaching a bottleneck state is displayed as a warm color, and a computing node with a low current computing load and a computing power being abundant is displayed as a cold color.
When the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; or adjusting the maximum batch processing amount of the bottleneck computing node.
Optionally, after the node splitting processing and/or the maximum batch processing amount is adjusted, the calculation flowchart may be reconstructed according to the adjusted maximum batch processing amount and/or the bottleneck calculation node after the splitting processing, and the reconstructed calculation flowchart may be obtained; re-executing the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the reconstructed computing flow chart; determining bottleneck computing nodes from each computing node, and optimizing the computing efficiency of the bottleneck computing nodes. Optionally, this step is repeated until there are no optimizable bottleneck compute nodes.
When the current computing load of the bottleneck computing node reaches a computing power bottleneck state, if residual computing resources exist, the computing resources can be dispatched to the target computing node. The embodiment of scheduling computing resources to a target computing node may be specifically executed with reference to the following steps a to b:
step b: and dispatching the computing resources to the target computing nodes which reach the preset computing power bottleneck condition.
In order to improve the computing efficiency under the condition of limited computing resources of scheduling equipment, acquiring an idle thread from a process for executing a computing flow chart; and scheduling the idle thread to the target computing node so that the target computing node processes the subtask corresponding to the target computing node in parallel by executing the current thread and the idle thread of the target computing node. The process of the computing flow chart for executing the target computing task may include multiple threads, the scheduling device allocates a thread for executing a sub-task to each computing node in the computing flow chart by scheduling the computing resource, and when the target computing node reaches a preset computing power bottleneck state, the idle thread in an idle state is scheduled to the target computing node from the process of the computing flow chart for executing the target computing task, so that the target computing node processes the sub-task in parallel through the multiple threads, thereby generating output data as soon as possible, and realizing independent management control of the computing node and the computing resource. The idle threads may be threads that have completed the execution of the subtasks of the computing nodes, or may also be threads that have not started the execution of the subtasks at a subsequent node assigned to the target computing node, where the number of the idle threads may be one or more, and the scheduling device schedules one or more corresponding idle threads to the target computing node according to the number and state of the current idle threads, so as to reduce the thread switching overhead in the operating system as much as possible.
In practical application, the scheduling device may be a device having a scheduling function, and the scheduling device calls a special thread to execute a calculation efficiency optimization task through a CPU, and schedules an idle thread to each target calculation node. The scheduling device may also be a device without scheduling function, such as NPU, cambrian MLU single card, etc., and the scheduling device may schedule its computing resources through the CPU.
According to the method for optimizing the computational efficiency, the computing nodes and the data transmission pipelines are abstracted from the target computing task, so that the data flow time sequence dependency relationship of the high-complexity computing task is simplified; the bottleneck computing nodes are found in time by monitoring the current computing load of each computing node in the computing flow chart; and the calculation efficiency of the bottleneck calculation node is optimized pertinently according to the reason that the bottleneck calculation node generates the bottleneck state, so that the calculation efficiency of the calculation equipment is greatly improved.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computing flowchart constructing apparatus according to some embodiments of the present disclosure. The calculation flow chart constructing device comprises: a first obtaining module 301, a setting module 302 and an optimizing module 303.
The first obtaining module 301 is configured to obtain each subtask of the target computing task and allocate a computing node to each subtask. The target computing task is divided into a plurality of subtasks, and a computing node is allocated to each subtask, wherein each subtask corresponds to one computing node. For example, in the target tracking calculation task, the target tracking calculation task may be divided into a plurality of subtasks, such as target detection and path generation, and one calculation node for implementing target detection and one calculation node for path generation may be correspondingly allocated.
The setting module 302 is configured to set a data pipeline between each computing node according to a dependency relationship between each subtask, so as to obtain a first computing flowchart. The data pipeline is used for transmitting the target data output by the output end of the upper-stage computing node to the input end of the lower-stage computing node. The data pipeline is used for connecting the computing nodes with data transmission relationship, and can be divided into a broadcast pipeline, a multi-branch pipeline, a merging pipeline, an order-preserving pipeline and the like according to the connection relationship among different computing nodes. Setting a data pipeline according to the data flow direction relation among the computing nodes, wherein the type of the data pipeline can be set in at least one of the following modes:
the first method is as follows: when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; and the first computing node is a node which transmits the data after the subtasks are executed to a plurality of next-level computing nodes at the same time.
The second method comprises the following steps: when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a splitting condition, and the target data is data generated after the second computing node executes the subtasks. The multi-branch pipeline selects one of the multiple data output pipelines for data output according to the satisfied conditions, for example, the computing node 1 is connected with the computing node 2 and the computing node 3 of the next level through the multi-branch pipeline (data pipeline a and data pipeline b), when the data generated by the computing node 1 executing the subtask satisfies the condition a, the computing node 1 transmits the generated data to the computing node 2 through the data pipeline a; when the data generated by the computing node 1 executing the subtask satisfies the condition B, the computing node 1 transmits the generated data to the computing node 3 through the data pipe B.
The third method comprises the following steps: when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; and the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks. The merging pipeline is a plurality of data pipelines used when data generated by a plurality of computing nodes executing subtasks are output to the same next-level node.
The method is as follows: when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data. The order-preserving pipeline refers to a data pipeline that adjusts an output data sequence according to an input data sequence of a computing node, for example, the data pipeline connected to the computing node 4 is an order-preserving pipeline, the computing node 4 is a node that performs an image detection task, an image frame number sequence received by the computing node when performing the image detection task is 12345, and if a frame number sequence corresponding to an image detection result output after the computing node 4 performs the image detection task is 21453, the order-preserving pipeline connected to the computing node 4 may correct the frame number sequence corresponding to the image detection result to 12345.
The optimization module 303 is configured to perform optimization processing on the connection relationship and/or the computing node of the first computing flowchart to obtain a second computing flowchart after the optimization processing. Only the optimization of the connection relation or the optimization of the calculation node can be performed, and of course, the optimization effect on the connection relation and the calculation node is optimal.
Wherein, optimizing the connection relationship may include the following: and optimizing the wrong connection relation existing in the first calculation flow chart.
The optimization processing of the computing nodes of the first computing flowchart may include any one or more of the following manners: and adjusting the serial computing nodes without data dependency relationship in the first computing flow chart into asynchronous parallel computing nodes. And setting parameters of each computing node and data pipeline in the first computing flow chart. And splitting the computing nodes meeting the splitting condition in each computing node in the first computing flow chart.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computational efficiency optimization apparatus according to some embodiments of the present disclosure. The calculation efficiency optimization device includes: a second acquisition module 401, a monitoring module 402, and a scheduling module 403.
The second obtaining module 401 is configured to obtain a calculation flowchart, where the calculation flowchart is constructed by using any one of the calculation flowchart construction methods described above.
The monitoring module 402 is configured to monitor a current computation load of each computation node in the computation flowchart during a process of applying the computation flowchart to perform computation processing on an object to be processed of a target computation task.
The optimization module 403 is configured to determine a bottleneck computing node from each computing node, and optimize the computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps: when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node; when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node; and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.
The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
As shown in fig. 5, an electronic device according to an embodiment of the present application is further provided, and includes a processor 501 and a memory 502, where the memory 502 stores computer readable instructions, and when the computer readable instructions are executed by the processor 501, the method according to any one of the above embodiments is performed.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A method for constructing a computational flow graph, comprising:
acquiring each subtask of a target computing task and distributing computing nodes for each subtask;
setting data pipelines among all computing nodes according to the dependency relationship among all subtasks to obtain a first computing flow chart;
and optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.
2. The method according to claim 1, wherein optimizing the connection relationship of the first computation flowchart includes:
and optimizing the wrong connection relation existing in the first calculation flow chart.
3. The method according to claim 2, wherein the optimizing the misconnection existing in the first computational flowchart comprises:
detecting whether the input end and the output end of each computing node of the first computing flow chart are connected with a data pipeline or not, and optimizing the first computing flow chart according to the detection result;
and/or detecting whether the input end and the output end of the data pipeline of the first calculation flow chart are both connected with other data pipelines or calculation nodes, and optimizing the first calculation flow chart according to the detection result;
and/or detecting whether a data pipeline without a set rule exists in the first calculation flow chart to obtain a third detection result, and optimizing the first calculation flow chart according to the detection result.
4. The method according to any one of claims 1 to 3, wherein performing optimization processing on the computing nodes of the first computing flowchart comprises:
adjusting serial computing nodes without data dependency in the first computing flow chart into asynchronous parallel computing nodes;
and/or setting parameters of each computing node and data pipeline in the first computing flowchart;
and/or splitting the computing nodes meeting the splitting condition in each computing node in the first computing flowchart.
5. The method according to claim 4, wherein the setting parameters of each computing node and data pipe in the first computing flowchart includes:
and setting parameters for setting the maximum length of the buffer queue for the data pipeline.
6. The method according to claim 5, wherein the setting parameters of each computing node and data pipe in the first computing flowchart includes:
and setting parameters of the maximum batch processing amount of each computing node and a corresponding time-out mechanism.
7. The method according to any one of claims 4 to 6, wherein the splitting processing of the computing node satisfying the splitting condition in each computing node in the first computing flowchart includes:
screening out target computing nodes from the computing nodes of the first computing flow chart; the subtask corresponding to the target computing node can be split into a plurality of subtasks;
splitting the target compute node into a plurality of new compute nodes.
8. The method according to claim 7, wherein the splitting processing is performed on the computing node satisfying the splitting condition in each computing node in the first computing flowchart, further comprising:
judging whether a target computing node exists in the plurality of new computing nodes;
if yes, screening out a target computing node from the plurality of new computing nodes, and returning to execute the step of splitting the target computing node into the plurality of new computing nodes;
if not, the splitting process is ended.
9. The method for building the computation flow chart according to any one of claims 1 to 8, wherein setting data pipelines between the computation nodes according to the dependency relationship between the subtasks comprises:
setting a data pipeline according to the data flow direction relation among the computing nodes;
setting a type of data pipe by at least one of:
when a first computing node exists, setting a data pipeline connected with the output end of the first computing node as a broadcast pipeline; the first computing node is a node which transmits the data after the subtask is executed to a plurality of next-level computing nodes at the same time;
when a second computing node exists, setting a data pipeline connected with the output end of the second computing node as a multi-branch pipeline; the second computing node is a node of a next-level computing node which needs to receive target data according to a selection condition, and the target data is data generated after the second computing node executes the subtasks;
when a third computing node exists, setting a data input pipeline of the third computing node as a merging pipeline; the third computing node is a node capable of receiving data after the plurality of upper-level computing nodes execute the subtasks;
when a fourth computing node exists, setting a data output pipeline of the fourth computing node as an order-preserving pipeline; and the fourth computing node is a node for keeping the receiving sequence of the data consistent with the output sequence of the data.
10. A method for optimizing computational efficiency, comprising the steps of:
acquiring a calculation flow chart, wherein the calculation flow chart is constructed by adopting the calculation flow chart construction method of any one of claims 1-9;
monitoring the current computing load of each computing node in the computing flow chart in the process of computing and processing the object to be processed of the target computing task by applying the computing flow chart;
determining a bottleneck computing node from each computing node, and optimizing the computing efficiency of the bottleneck computing node, wherein the step of optimizing the computing efficiency of the bottleneck computing node comprises at least one of the following steps:
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node;
and when the current computing load of the bottleneck computing node reaches a computing power bottleneck state, scheduling computing resources to the target computing node.
11. The computational efficiency optimization method of claim 10, further comprising:
re-acquiring a calculation flow chart according to the adjusted maximum batch processing amount and/or the split bottleneck calculation node;
monitoring the current computing load of each computing node in the computing flow chart in the process of computing the object to be processed of the target computing task by applying the computing flow chart; determining a bottleneck computing node from each computing node, and optimizing the computing efficiency of the bottleneck computing node.
12. A computing flow chart constructing apparatus, comprising:
the first acquisition module is used for acquiring each subtask of the target computing task and distributing computing nodes for each subtask;
the setting module is used for setting data pipelines among all the computing nodes according to the dependency relationship among all the subtasks to obtain a first computing flow chart;
and the optimization module is used for optimizing the connection relation and/or the computing nodes of the first computing flow chart to obtain a second computing flow chart after optimization.
13. A computational efficiency optimization apparatus, comprising:
a second obtaining module, configured to obtain a pre-constructed computational flowchart, where the computational flowchart is constructed by using the computational flowchart construction method according to any one of claims 1 to 9;
the monitoring module is used for monitoring the current computing load of each computing node in the computing flow chart in the process of applying the computing flow chart to compute and process the object to be processed of the target computing task;
an optimization module, configured to determine a bottleneck computing node from each computing node, and optimize the computing efficiency of the bottleneck computing node, where the step of optimizing the computing efficiency of the bottleneck computing node includes at least one of the following steps:
when the current computing load of a target bottleneck computing node does not reach a computing power bottleneck state, judging whether the bottleneck computing node meets a splitting condition or not, and if so, splitting the bottleneck computing node;
when the current computing load of a bottleneck computing node does not reach a computing power bottleneck state, adjusting the maximum batch processing amount of the bottleneck computing node;
and when the current computing load of the bottleneck computing node reaches a preset computing power bottleneck state, scheduling computing resources to the target computing node.
14. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-9.
CN202110433418.1A 2020-07-10 2021-04-21 Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment Active CN113238837B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010666946 2020-07-10
CN2020106669467 2020-07-10

Publications (2)

Publication Number Publication Date
CN113238837A true CN113238837A (en) 2021-08-10
CN113238837B CN113238837B (en) 2022-12-27

Family

ID=77128886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110433418.1A Active CN113238837B (en) 2020-07-10 2021-04-21 Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113238837B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379400A (en) * 2021-08-13 2021-09-10 南京新一代人工智能研究院有限公司 Business process carding system and method for grading and pressurizing flow
CN113806044A (en) * 2021-08-31 2021-12-17 天津大学 Heterogeneous platform task bottleneck elimination method for computer vision application

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546377A (en) * 1995-10-31 1996-08-13 Digital Equipment Corporation Efficient distributed method for computing max-min fair rates of a limited resource in ATM networks
JP2006350657A (en) * 2005-06-15 2006-12-28 Kansai Electric Power Co Inc:The Simulator and simulation method of distributed processing system
WO2018121738A1 (en) * 2016-12-30 2018-07-05 北京奇虎科技有限公司 Method and apparatus for processing streaming data task
CN108508853A (en) * 2018-03-13 2018-09-07 济南大学 Based on the method for improving extension moving bottleneck algorithm solution product integrated dispatch problem
CN109815011A (en) * 2018-12-29 2019-05-28 东软集团股份有限公司 A kind of method and apparatus of data processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546377A (en) * 1995-10-31 1996-08-13 Digital Equipment Corporation Efficient distributed method for computing max-min fair rates of a limited resource in ATM networks
JP2006350657A (en) * 2005-06-15 2006-12-28 Kansai Electric Power Co Inc:The Simulator and simulation method of distributed processing system
WO2018121738A1 (en) * 2016-12-30 2018-07-05 北京奇虎科技有限公司 Method and apparatus for processing streaming data task
CN108508853A (en) * 2018-03-13 2018-09-07 济南大学 Based on the method for improving extension moving bottleneck algorithm solution product integrated dispatch problem
CN109815011A (en) * 2018-12-29 2019-05-28 东软集团股份有限公司 A kind of method and apparatus of data processing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379400A (en) * 2021-08-13 2021-09-10 南京新一代人工智能研究院有限公司 Business process carding system and method for grading and pressurizing flow
CN113806044A (en) * 2021-08-31 2021-12-17 天津大学 Heterogeneous platform task bottleneck elimination method for computer vision application
CN113806044B (en) * 2021-08-31 2023-11-07 天津大学 Heterogeneous platform task bottleneck eliminating method for computer vision application

Also Published As

Publication number Publication date
CN113238837B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
US20200137151A1 (en) Load balancing engine, client, distributed computing system, and load balancing method
JP6447120B2 (en) Job scheduling method, data analyzer, data analysis apparatus, computer system, and computer-readable medium
CN111400008A (en) Computing resource scheduling method and device and electronic equipment
CN113238837B (en) Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment
US8205113B2 (en) Fault tolerant batch processing
WO2021000693A1 (en) Service fusing method and apparatus and message middleware
US9979631B2 (en) Dynamic rerouting of service requests between service endpoints for web services in a composite service
CN111625331B (en) Task scheduling method, device, platform, server and storage medium
US10367719B2 (en) Optimized consumption of third-party web services in a composite service
CN104915407A (en) Resource scheduling method under Hadoop-based multi-job environment
CN112148455A (en) Task processing method, device and medium
CN109189572B (en) Resource estimation method and system, electronic equipment and storage medium
CN113228574A (en) Computing resource scheduling method, scheduler, internet of things system and computer readable medium
CN115269108A (en) Data processing method, device and equipment
GB2602213A (en) Automated operational data management dictated by quality-of-service criteria
CN114896121A (en) Monitoring method and device of distributed processing system
CN108415765B (en) Task scheduling method and device and intelligent terminal
Tsenos et al. Amesos: a scalable and elastic framework for latency sensitive streaming pipelines
US20200210307A1 (en) Method for automatically analyzing bottleneck in real time and an apparatus for performing the method
CN115185683A (en) Cloud platform stream processing resource allocation method based on dynamic optimization model
US20170293654A1 (en) Deferred joining of a stream of tuples
US20240249271A1 (en) Processing Schedule of an Electronic Transaction
US12135996B2 (en) Computing resource scheduling method, scheduler, internet of things system, and computer readable medium
US12119108B2 (en) Medical ETL task dispatching method, system and apparatus based on multiple centers
CN115051980B (en) HTCondor super-calculation grid file transmission method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant