CN117608811A - Task processing method, computing device and computer readable storage medium - Google Patents

Task processing method, computing device and computer readable storage medium Download PDF

Info

Publication number
CN117608811A
CN117608811A CN202311114706.6A CN202311114706A CN117608811A CN 117608811 A CN117608811 A CN 117608811A CN 202311114706 A CN202311114706 A CN 202311114706A CN 117608811 A CN117608811 A CN 117608811A
Authority
CN
China
Prior art keywords
strategy
operator
target
preset
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311114706.6A
Other languages
Chinese (zh)
Inventor
程盛淦
刁岚松
林伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Original Assignee
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Alibaba Cloud Feitian Information Technology Co ltd filed Critical Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority to CN202311114706.6A priority Critical patent/CN117608811A/en
Publication of CN117608811A publication Critical patent/CN117608811A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Embodiments of the present disclosure provide a task processing method, a computing device, and a computer-readable storage medium, where the task processing method includes: acquiring a task model corresponding to a task to be processed; dividing the input data of the first operator by adopting a preset division strategy to obtain a plurality of input data sets; invoking a first operator to process any input data set to obtain output data corresponding to any input data set; combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result; and under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining the target distributed strategy of the first operator based on the preset division strategy and the preset merging strategy. The distributed strategy exploration is carried out on the operator level, and each operator target distributed strategy which is most suitable for the optimization target is automatically determined, so that the optimization processing of data parallelism is realized, and the universality, expandability and efficiency of task processing are improved.

Description

Task processing method, computing device and computer readable storage medium
Technical Field
The embodiment of the specification relates to the technical field of deep learning, in particular to a task processing method.
Background
With the development of deep learning technology, the development, training and application of large-scale neural network models in a plurality of application fields are greatly improved.
At present, data Parallel (Data Parallel) processing of the neural network model is realized through a distributed strategy, so that the training and application efficiency of the neural network model is greatly improved.
However, because the neural network model is developed under different model frameworks, the neural network model developed under different frameworks has different model frameworks, different model structures and different computing access characteristics, corresponding to different distributed policies. Aiming at the model difference, a target distributed strategy which is most suitable for a neural network model is determined in a manual analysis and labeling mode to realize data parallel processing on a plurality of node devices in a distributed mode, and the problems of insufficient generality, insufficient expansibility and insufficient efficiency exist. Therefore, there is a need for a task processing method that is highly versatile, highly scalable, and highly efficient.
Disclosure of Invention
In view of this, the present embodiment provides a task processing method. One or more embodiments of the present specification relate to a task processing device, a computing device, a computer-readable storage medium, and a computer program that solve the technical drawbacks existing in the prior art.
According to a first aspect of embodiments of the present specification, there is provided a task processing method, including:
acquiring a task model corresponding to a task to be processed, wherein the task model comprises a plurality of operators, and the operators carry label execution results;
dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators;
invoking a first operator to process any input data set to obtain output data corresponding to any input data set;
combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result;
and under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy, wherein the target distributed strategy of the first operator is an optimization strategy for executing the task to be processed by distributing the first operator on a plurality of node devices.
According to a second aspect of embodiments of the present specification, there is provided another task processing method, including:
acquiring a task to be processed;
invoking a pre-trained task model to execute a task to be processed, and obtaining a task execution result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy.
According to a third aspect of embodiments of the present disclosure, there is provided a further task processing method, applied to a cloud-side device, including:
receiving a target reasoning task sent by a front end;
invoking a task model trained in advance to execute a target reasoning task to obtain a reasoning result, wherein the task model is obtained by carrying out distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for carrying out training on a plurality of node devices by distributing each operator, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset merging strategy;
and feeding back the reasoning result to the front end.
According to a fourth aspect of embodiments of the present specification, there is provided a task processing device including:
the first acquisition module is configured to acquire a task model corresponding to a task to be processed, wherein the task model comprises a plurality of operators, and the operators carry label execution results;
the first division module is configured to divide input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the plurality of operators;
The first processing module is configured to call a first operator to process any input data set to obtain output data corresponding to any input data set;
the first merging module is configured to merge the plurality of output data by adopting a preset merging strategy to obtain a prediction execution result;
the first determining module is configured to determine a target distributed strategy of the first operator based on a preset dividing strategy and a preset combining strategy under the condition that a predicted execution result is consistent with a label execution result carried by the first operator, wherein the target distributed strategy of the first operator is an optimization strategy for executing tasks to be processed by distributing the first operator on a plurality of node devices.
According to a fifth aspect of embodiments of the present specification, there is provided another task processing device comprising:
the second acquisition module is configured to acquire a task to be processed;
the second execution module is configured to call a pre-trained task model to execute a task to be processed to obtain a task execution result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy.
According to a sixth aspect of embodiments of the present specification, there is provided still another task processing device, applied to a cloud-side apparatus, including:
the third receiving module is configured to receive the target reasoning task sent by the front end;
the third execution module is configured to call a pre-trained task model to execute a target reasoning task to obtain a reasoning result, wherein the task model is obtained by carrying out distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for carrying out training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy;
and the third feedback module is configured to feed back the reasoning result to the front end.
According to a seventh aspect of embodiments of the present specification, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions that, when executed by the processor, perform the steps of the method described above.
According to an eighth aspect of embodiments of the present description, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the above-described method.
According to a ninth aspect of the embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above method.
In one or more embodiments of the present disclosure, a task model corresponding to a task to be processed is obtained, where the task model includes a plurality of operators, and the operators carry a label execution result; dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators; invoking a first operator to process any input data set to obtain output data corresponding to any input data set; combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result; and under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy, wherein the target distributed strategy of the first operator is an optimization strategy for executing the task to be processed by distributing the first operator on a plurality of node devices. The distributed strategy exploration of each operator is carried out on the operator level of the task model, the dependence on the distributed strategy of a specific model is eliminated by a consistency matching detection mode, the distributed strategy of each operator target which is most suitable for an optimization target can be automatically determined without manual analysis and marking, the operators are distributed and deployed on a plurality of node devices, the parallel optimization processing of data is realized, and the universality, the expandability and the efficiency of the task processing are improved.
Drawings
FIG. 1 is a schematic illustration of a computational graph;
FIG. 2 is a flow chart of a method of task processing provided in one embodiment of the present disclosure;
FIG. 3 is a flow chart of another task processing method provided by one embodiment of the present disclosure;
FIG. 4 is a flow chart of yet another task processing method provided by one embodiment of the present disclosure;
FIG. 5 is a flow chart of a task processing method according to an embodiment of the present disclosure;
FIG. 6 is a process flow diagram of a method of task processing for an inference task provided in one embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a task processing device according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of another task processing device according to one embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a task processing device according to an embodiment of the present disclosure;
FIG. 10 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
In one or more embodiments of the present description, a large model refers to a deep learning model with large scale model parameters, typically including hundreds of millions, billions, trillions, and even more than one billion model parameters. The large Model can be called as a Foundation Model, a training Model is performed by using a large-scale unlabeled corpus, a pre-training Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (Large Language Model, LLM), a multi-mode pre-training Model and the like.
When the large model is actually applied, the pretrained model can be applied to different tasks by only slightly adjusting a small number of samples, the large model can be widely applied to the fields of natural language processing (Natural Language Processing, NLP for short), computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as visual question and answer (Visual QuestionAnswering, VQA for short), image description (IC for short), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.
First, terms related to one or more embodiments of the present specification will be explained.
Polymerization strategy (Gather): and splicing a group of data (tensors) in a certain dimension, wherein the spliced data (tensors) are the output of the strategy.
Addition reduction strategy (Reduce-sum): element-by-element addition is performed on a set of data (tensors), and the resulting added data (tensors) are then the output of the policy.
Maximization reduction strategy (Reduce-max): the maximum value is obtained element by element for a group of data (tensors), and the obtained data (tensors) are output by the strategy.
IR (Intermediate Representation ): typically, computer compilation technology refers to a class of intermediate instruction descriptions representing program functions during the generation of binary instructions from high-level language code that can be directly executed by a particular apparatus.
SPMD (Single Program/multiplex Data, single Program multitasking for task parallelism): the same program is replicated to multiple processors, each processing a different piece of data.
ILP (MILP) ((Mixed) Integral Linear Programming, (Mixed) integer linear program): it is referred to as a programming problem that the objective function is a linear function, all constraints are also linear, and the (partial) decision variable value has to be an integer.
FLOPS (FLoatpoint Operations Per Second, floating point number of operations per second): to describe the computing power of a computing device or computing item, the total amount of multiplication and addition operations are typically counted and distinguished by precision (64-bit double precision, 32-bit single precision, or 16-bit half precision).
MAC (MemoryAccess Cost): memory access costs.
OPS (Operations Per Second): an operand per second.
SOTA: state-Of-The-Art, the result is currently optimized.
Calculation chart: is a data structure used to represent a computing process. It is a graph of a set of nodes and edges. FIG. 1 shows a schematic diagram of a computational graph, as shown in FIG. 1, with nodes representing computational units (i.e., operators, the basic computational units of a model) and edges representing logical processing relationships between the computational units. Computational graphs can be used to describe and manipulate complex image and text data, and in deep learning, the internal mechanisms of neural network models can be understood by understanding the principles and methods of use of computational graphs, which can be understood as the execution code of each operator. Generally, in the process of model development, a developer determines a corresponding operator set according to a function to be realized, connects according to a certain graph structure, and constructs a calculation graph, namely an internal mechanism of the model, to obtain a corresponding execution code.
Beam Search (Beam Search): an exploration algorithm selects the most promising state from the current state set each time and expands it to more possible states until the final result is found or a certain stop condition is reached. During the exploration, the cluster exploration will reserve a part of the most likely states and expand them to more possible states in order to find the optimal solution.
Greedy Search (Greedy Search): an exploration algorithm selects an optimization solution from a current set of states at a time and expands it to more possible states until a final result is found or a certain stop condition is reached. In the exploration process, greedy exploration can select a current optimal solution each time and expand the current optimal solution into more possible states so as to find the optimal solution.
At present, the continuous development of deep learning computing force promotes the development, training and application of a field basic model (Foundamental Models) with ultra-large parameter scale and various project models, and accordingly, the determination of a high-efficiency distributed strategy becomes an important challenge. The traditional Data Parallel (Data Parallel) is a common form for performing distributed training on a neural network model, namely, training sample Data is divided into a plurality of parts, the same model parameters are copied on a plurality of node devices for Parallel training, then a preset merging strategy is adopted for merging prediction output, and finally, updating iteration of the neural network model is realized through a gradient updating method. For the neural network model with oversized parameters, the model parameters cannot be stored on a single node device in a replication mode, so that data parallelism encounters a bottleneck, and the neural network model needs to be segmented at the moment, so that the distributed processing of the neural network model with large-scale parameters is realized, and the method comprises the following steps: dividing the whole calculation into different devices according to different successive stages, or dividing the Model operator (data) into a plurality of node devices for common calculation and combining at a specific position, which can be summarized in the technical means of Model parallelism. For the first method, a data parallel technology can be combined to process a plurality of Batch data sets (Micro Batch) on the same node device, so as to realize pipeline parallel (Pipeline Parallel), and the data sets are combined after a certain number of samples are processed, so that the resource utilization rate of the node device is improved.
Through the data parallel means, the industry can support a neural network model with larger parameter scale by using a larger scale computing platform, for example, a PaLM language model, the parameter scale reaches 5400 hundred million, 6144 TPU (Tensor Processing Unit ) processors are used, and the hardware FLOPS efficiency reaches 57.8%. The computational resources consumed by the current SOTA model are doubled every 3.5 months.
Around the performance tuning of large-scale neural network models, the whole process is still very complex considering the isomerism of hardware platforms and network connections due to the need to combine specific frameworks, specific structures and computational memory characteristics of the models. The current open source distributed training development scheme relies on manual analysis and labeling of a neural network model in the development process to determine a target distributed strategy of each operator, so that the requirement on the manual analysis level is high, the universality and the expandability are insufficient along with the iteration and the change of the model, a target distributed strategy method with high pertinence is difficult to ensure, the analysis, labeling and tuning time cost is high, the automation degree is low, and the universality, the expandability and the efficiency of task processing are insufficient.
Aiming at the problems, the task processing method is provided in the specification, realizes full-automatic distributed strategy exploration, does not need manual analysis, labeling and tuning, and has high automation degree. The method gets rid of the limitation of the model framework of the task processing model and can be expanded to various neural network model frameworks. By means of consistency matching detection, the target distributed strategy is determined, and compared with manual analysis, labeling and tuning, the method has the advantages of more sufficient consideration and verification and uniform or higher accuracy. No restrictions are placed on the model structure and computational access features. The method is generally applicable to the width, the depth of the hierarchy and the density or the sparsity of the calculation memory, can automatically determine a target distributed strategy for each task model, does not need to follow a fixed distributed strategy exploration method, and has higher universality.
In the present specification, a task processing method is provided, and the present specification relates to a task processing device, a computing device, a computer-readable storage medium, and a computer program, which are described in detail in the following embodiments one by one.
Referring to fig. 2, fig. 2 shows a flowchart of a task processing method according to an embodiment of the present disclosure, including the following specific steps:
Step 202: and acquiring a task model corresponding to the task to be processed, wherein the task model comprises a plurality of operators, and the operators carry label execution results.
The embodiment of the specification is applied to a client, a server or cloud side device of a webpage, an application program or an applet with a task processing function. Aiming at the webpage, the application program or the applet, a plurality of distributed node devices are deployed to realize distributed task processing. The task processing method of the embodiment of the present disclosure may be used as an application programming interface (API, application Programming Interface) to implement automatic exploration of the target distributed policy of each operator.
The task to be processed is a task to be executed which is realized based on a neural network model and comprises a natural language processing task, an image processing task, a model training task and the like. For example, for natural language processing tasks, there are entity recognition (entity extraction) tasks, question-answering tasks, reasoning tasks, text classification tasks, text-to-graph tasks, and the like. For example, the image processing task includes an image classification task, a target detection task, an image segmentation task, an image enhancement task, an image denoising task, a style migration task, and the like. For example, for model training tasks, there are model pre-training tasks, model fine tuning tasks, model evaluation tasks, and the like.
The task model corresponding to the task to be processed is a neural network model corresponding to the task to be processed, the task model is a distributed model, the model framework, the model structure and the calculation memory characteristics of the task model are not limited, for example, the task to be processed is an entity identification task, and the task model corresponding to the task to be processed is an entity identification model. For another example, the task to be processed is a style migration task, and the task model corresponding to the task to be processed is a style migration model. For example, the task to be processed is a model training task, and the task model corresponding to the task to be processed is a target model to be trained. The task model comprises a plurality of operators, the operators are basic constituent units of the neural network model and basic calculation units of the model, the operators define the model structure, the model framework and the calculation memory characteristics of the neural network model, and the operators define the calculation graph of the task model. The operators define the logical processing of the model functions, including addition, subtraction, multiplication and division, matrix operation, pooling processing, full concatenation and the like. The method comprises the steps of having an ATen operator set in a Pytorch model framework, defining a neural network model under the developed Pytorch model framework through the ATen operator, having a Jax primities operator set in a Jax model framework, defining a neural network model under a developed Jax model framework through the Jax primities operator, having an HLO (High Level Optimizer, high-level optimization) operator set in a TensorFlow model framework, and defining a neural network model under the developed TensorFlow model framework through the HLO operator set. The label execution result is label result data of a target distributed strategy for exploring each operator, the label execution result is a pre-labeled operator, for example, a certain operator is a logically processed operator of matrix multiplication operation, and the label execution result is a matrix multiplication operation result pre-labeled for the operator.
The task to be processed is directly sent by the front end, and the task model is determined from a plurality of reference models based on the task to be processed, or can be a task model sent by the front end and developed with the task to be processed.
The front end sends a task to be processed, the task to be processed is an inference task, and based on the inference task, an inference big model under a Pytorch model framework is determined from a plurality of reference models, wherein the inference big model comprises 10000 ATen operators, and each ATen operator carries a corresponding label execution result.
And acquiring a task model corresponding to the task to be processed, wherein the task model comprises a plurality of operators, and the operators carry label execution results. By acquiring a task model corresponding to a task to be processed, which comprises a plurality of operators and a carried tag execution result, an operator base and a reference tag are provided for carrying out distributed strategy exploration of each operator on an operator level of the task model.
Step 204: dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators.
The preset dividing strategy is a preset strategy for parallel dividing of the input data of the operator, and the preset dividing strategy is various and comprises but is not limited to: a data parallel partitioning strategy and a model parallel partitioning strategy. The data parallel division policy is a policy of dividing the input data of the operator in parallel according to the data attribute, for example, the input data of the operator is an image, and the input data of the operator is divided in parallel according to characters, scenery and objects. The model parallel division policy is a policy of performing parallel division on input data of an operator according to model attributes (operator attributes), for example, a policy of performing parallel division according to a logical processing type of the operator.
The input data of the first operator is data directly input into the first operator, for example, the task model is a entity extraction model, the first operator is an operator with a text classification function, and the input data of the first operator is text to be classified.
The input data set is obtained by dividing the input data of the first operator in parallel according to a preset division strategy. For example, the first operator is an operator with a text classification function, input data of the first operator is text to be classified (including english text, chinese text and japanese text), and the preset classification policy is to perform parallel classification according to the language of the text to be classified, so as to obtain 3 input data sets by classification: english text groups, chinese text groups, and japanese text groups.
Dividing the input data of the first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the specific mode is as follows: and carrying out parallel division on the input data of the first operator by adopting a preset division strategy to obtain a plurality of input data sets.
Illustratively, the input data of the ATen first operator is divided in parallel by adopting 20 preset division strategies, so as to obtain 20×10 input data groups.
Dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators. The preset division strategy exploration of each operator is carried out on the operator level of the task model, and a foundation is laid for carrying out the distributed strategy exploration of each operator.
Step 206: and calling a first operator to process any input data set to obtain output data corresponding to any input data set.
The output data is the result data of the logic processing of the first operator on the input data set, and is a reference output data, for example, the first operator is the operator of the logic processing of the matrix multiplication operation, and the output execution result is the output result of the matrix multiplication operation of the first operator on any input data set.
And calling a first operator to process any input data set to obtain output data corresponding to any input data set, wherein the specific mode is as follows: and calling the first operator to perform corresponding logic processing on any input data set through the execution code corresponding to the first operator, so as to obtain output data corresponding to any input data set.
Illustratively, the first operator of ATen is invoked to perform corresponding logic processing on the 20×10 input data groups through the execution code corresponding to the first operator of ATen, so as to obtain 20×10 output data.
And calling a first operator to process any input data set to obtain output data corresponding to any input data set. The reference output result of the first operator is obtained, and a data foundation is laid for obtaining a preset execution result subsequently.
Step 208: and combining the plurality of output data by adopting a preset combining strategy to obtain a prediction execution result.
The preset merging strategy is a preset strategy for merging and integrating the output data of the operators, and the preset merging strategy is various and includes but is not limited to: aggregation policy, addition reduction policy, and maximization reduction policy.
The prediction execution result is a reference execution result obtained by combining a plurality of output data, and the reference execution result is used as a reference result to explore a target distributed strategy of each operator. For example, the first operator is an operator of logic processing of matrix multiplication operation, and output results of a plurality of matrix multiplication operations are combined to obtain a prediction execution result, i.e., a matrix multiplication operation result.
And combining the plurality of output data by adopting a preset combining strategy to obtain a predicted execution result, wherein the specific mode is as follows: and carrying out parallel combination on a plurality of output data by adopting a preset combination strategy to obtain a prediction execution result.
For example, 20×10 output data are combined in parallel by using 5 preset combining strategies, so as to obtain 20×5 prediction execution results.
And combining the plurality of output data by adopting a preset combining strategy to obtain a prediction execution result. The preset merging strategy exploration of each operator is carried out on the operator level of the task model, and a foundation is laid for carrying out the distributed strategy exploration of each operator.
Step 210: and under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy, wherein the target distributed strategy of the first operator is an optimization strategy for executing the task to be processed by distributing the first operator on a plurality of node devices.
The distributed strategy of the operator is a distributed parallel processing strategy of the operator, which refers to a strategy of performing distributed deployment on a plurality of node devices, including but not limited to: preset partitioning policies (e.g., data parallel partitioning policies and model parallel partitioning policies), gradient accumulation policies, data reuse optimization policies, communication overhead optimization policies, and preset merging policies (e.g., aggregation policies, addition protocol policies, and maximization protocol policies). The target distributed policy of the first operator is an optimization policy for performing tasks to be processed by distributing the first operator on a plurality of node devices, and is a distributed parallel processing policy matching the optimization target, including but not limited to: the distributed parallel processing strategy with optimized communication overhead, the distributed parallel processing strategy with optimized processing efficiency, the distributed parallel processing strategy with optimized configuration of software and hardware, the distributed parallel processing strategy with optimized matching availability and the distributed parallel processing strategy with optimized matching energy consumption. For example, based on 3 preset division policies (division policy 1, division policy 2, and division policy 3) and 2 preset merge policies (merge policy 1 and merge policy), 6 distributed policies may be determined, and in the case where the optimization target is communication overhead optimization, the distributed policies determined by division policy 2 and merge policy 1 are determined to be target distributed policies. The distributed strategy determined by the preset dividing strategy and the preset combining strategy is taken as an exploration range, and a target solution suitable for the optimization target is obtained through exploration.
Under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy, wherein the specific mode is as follows: and under the condition that the predicted execution result is consistent with the tag execution result carried by the first operator, determining at least one preset division strategy and at least one preset merging strategy, and determining the target distributed strategy of the first operator based on the at least one preset division strategy and the at least one preset merging strategy.
After determining the target distributed policy of each operator, the target distributed policy of each operator may be directly fed back to the front end, so that the front end may use the target distributed policy of each operator in the task model to execute the task to be processed, or may use the target distributed policy of each operator in the task model to execute the task to be processed, and then feed back the task processing result to the front end, which is not limited herein.
The method includes the steps that a prediction execution result consistent with a tag execution result carried by an ATen first operator is determined from 20×5 prediction execution results, 3 sets of corresponding preset division strategies and preset combination strategies are determined, a target distributed strategy of the ATen first operator is determined based on the preset division strategies, the preset combination strategies, the gradient accumulation strategies, the data reuse optimization strategies and the communication overhead optimization strategies, the target distributed strategy of the ATen first operator is matched with an optimization target of communication overhead, the target distributed strategies of 10000 ATen operators are fed back to a front end, the front end is enabled to execute reasoning tasks based on the target distributed strategies of 10000 ATen operators of a reasoning big model, and a target reasoning result is obtained.
In the embodiment of the specification, a task model corresponding to a task to be processed is obtained, wherein the task model comprises a plurality of operators, and the operators carry label execution results; dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators; invoking a first operator to process any input data set to obtain output data corresponding to any input data set; combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result; and under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy, wherein the target distributed strategy of the first operator is an optimization strategy for executing the task to be processed by distributing the first operator on a plurality of node devices. The distributed strategy exploration of each operator is carried out on the operator level of the task model, the dependence on the distributed strategy of a specific model is eliminated by a consistency matching detection mode, the distributed strategy of each operator target which is most suitable for an optimization target can be automatically determined without manual analysis and marking, the operators are distributed and deployed on a plurality of node devices, the parallel optimization processing of data is realized, and the universality, the expandability and the efficiency of the task processing are improved.
Optionally, after step 210, the following specific steps are further included:
and executing the task to be processed by adopting a target distributed strategy of each operator in the task model.
The target distributed strategy of each operator in the task model is adopted to execute the task to be processed, and the specific mode is as follows: and determining the target distributed strategy of the computational graph based on the target distributed strategy of each operator in the task model, and executing the task to be processed by adopting the target distributed strategy of the computational graph.
It should be noted that, in the embodiment of the present disclosure, the target distribution policy of each operator is dynamically and elastically determined, so, whether for a stable operator set of multiple operators or an unstable operator set, by automatically determining the target distribution policy for any operator, the task to be processed is executed, so that automatic adaptation can be implemented, a new operator can be automatically received, and the execution code of the operator cannot be modified due to the increase and decrease of the operators, which does not need to be manually interposed, and the task processing has high versatility, high scalability and high efficiency.
The method comprises the steps of determining a target distributed strategy of a computational graph based on a target distributed strategy of 10000 ATen operators of an inference big model, executing an inference task by adopting the target distributed strategy of the computational graph to obtain a target inference result, and feeding back the target inference result to a front end.
And executing the task to be processed by adopting a target distributed strategy of each operator in the task model. And the target distributed strategy is directly adopted to execute the task to be processed, so that the task processing efficiency is improved.
In an alternative embodiment of the present disclosure, the task model is a neural network model under a target model framework;
correspondingly, step 208 includes the following specific steps:
and applying the target distributed strategy to a target model framework, and executing the task to be processed by adopting the target distributed strategy of each operator in the task model under the target model framework.
The goal model framework is a development framework for task models, including, but not limited to: pytorch model framework, jax model framework, TVM model framework, and TensorFlow model framework.
The target distributed strategy is applied to a target model framework, and under the target model framework, the target distributed strategy of each operator in the task model is adopted to execute the task to be processed, and the specific mode is as follows: and applying the target distributed strategy to a target model framework, and executing the task to be processed by adopting the target distributed strategy of each operator in the task model under the distributed computing framework of the target model framework. The distributed computing framework is a predefined logic processing framework of distributed computing under the target framework, for example, torch.
The target distributed strategy of 10000 ATen operators is injected into the Pytorch model framework, and under the distributed calculation framework torch.distributed of the Pytorch model framework, the target distributed strategy of each ATen operator in the large reasoning model is adopted to execute the reasoning task, so as to obtain the target reasoning result.
And applying the target distributed strategy to a target model framework, and executing the task to be processed by adopting the target distributed strategy of each operator in the task model under the target model framework. The target distributed strategy of each operator is determined firstly and then executed under the target model framework, so that a plurality of sets of model frameworks are supported, the expansion and the transplantation are convenient, and the universality and the expansibility of task processing are improved.
In an alternative embodiment of the present disclosure, before step 204, the following specific steps are further included:
and converting the operators of the task model into a plurality of operators of the target format according to the conversion rules corresponding to the target model frame.
The operator in the target format is an expression format of an abstract data structure, and the target format is an abstract expression unrelated to the model framework, for example, a meta operation (MetaOp) format.
The conversion rule characterizes the format for a logical process of the target model framework to the target format. For example, convert from the ATen operator under the Pytorch model framework to an operator under MetaOp format. The conversion rule may be preset in a conversion layer, for example, a light-weight conversion layer.
According to a conversion rule corresponding to the target model frame, converting a plurality of operators of the task model into a plurality of operators of a target format, wherein the specific mode is as follows: and converting the operators of the task model into operators of the target format according to the conversion rules corresponding to the target model frame by utilizing the conversion rules preset in the preset conversion layer.
It should be noted that, in the embodiment of the present disclosure, the objective of steps 202 to 210 is to determine a distributed policy of an operator adapted to an optimization objective, where the policy is independent of a framework, and corresponds to determining a target distributed policy of an operator in a conversion layer that is not constrained by the framework.
Optionally, the execution codes of the plurality of operators of the task model are converted into an intermediate representation in the target format according to conversion rules corresponding to the target model framework. The intermediate representation operator is represented by the division and result merging angles of the input data, complete semantic representation is not needed, and the single format of the expression format of the abstract data structure covers as many division strategies and merging strategies as possible, so that the method has high expansibility.
Correspondingly, step 206 includes the following specific steps: and calling the first operator of the target format to perform corresponding logic processing on any input data set through the intermediate representation of the first operator under the target format, so as to obtain output data corresponding to any input data set.
Illustratively, 10000 operators of the inference large model are converted into 10000 operators in the MetaOp format according to ATen operators of the Pytorch model framework to operators in the MetaOp format by utilizing a plurality of conversion rules (ATen operators to operators in the MetaOp format, jax primitves operators to operators in the MetaOp format, HLO operators to operators in the MetaOp format) preset in a preset conversion layer.
And converting the operators of the task model into a plurality of operators of the target format according to the conversion rules corresponding to the target model frame. And a foundation is laid for determining the target distributed strategy of each operator under the model framework.
In an alternative embodiment of the present disclosure, step 204 includes the following specific steps:
dividing the input data of the first operator in parallel by adopting a plurality of preset dividing strategies to obtain a plurality of input data sets corresponding to the preset dividing strategies respectively;
correspondingly, step 208 includes the following specific steps:
adopting a plurality of preset merging strategies to merge a plurality of output data in parallel to obtain prediction execution results corresponding to the preset merging strategies respectively;
correspondingly, step 210 includes the following specific steps:
determining at least one target prediction execution result consistent with the label execution result carried by the first operator in the plurality of prediction execution results;
Determining at least one preset dividing strategy and at least one preset combining strategy corresponding to at least one target prediction execution result;
determining a target partitioning strategy and a target merging strategy based on at least one preset partitioning strategy and at least one preset merging strategy;
and determining a target distributed strategy of the first operator based on the target division strategy and the target merging strategy.
Generally, the number of preset division strategies is multiple, a plurality of sets of distributed strategies determined based on the preset division strategies and the preset combination strategies are integrally abstracted into an optimization model, then the optimization model is solved, at least one preset division strategy and at least one preset combination strategy are determined, the target division strategy and the target combination strategy which are suitable for an optimization target are obtained, and then the target distributed strategy of an operator is determined.
The target prediction execution result is a reference execution result consistent with the tag execution result, and the at least one preset division policy and the at least one preset merging policy corresponding to the target prediction execution result are the preset division policy and the preset merging policy which are correct in logic processing, namely the reasonable preset division policy and the reasonable preset merging policy.
The target division policy is a policy of dividing input data of an operator in parallel, which is logically correct in processing and is adapted to an optimization target. The target merging strategy is a strategy for merging and integrating the output data of operators, wherein the logic processing is correct and is suitable for optimizing the target. The target partitioning strategy and the target merging strategy determine a target distributed strategy adapted to optimize a first operator of the target. The exploration range is determined by combining at least one preset division strategy and at least one preset combination strategy, the exploration range is abstracted into a linear programming model, and then a target distributed strategy which is suitable for an optimization target of linear programming is determined by solving the linear programming model.
Thus, the target distributed policy is a distributed policy that is logically correct in processing and matches the optimization target.
Based on at least one preset division strategy and at least one preset combination strategy, determining a target division strategy and a target combination strategy in the following specific modes: and determining a target division strategy and a target combination strategy corresponding to the optimization target based on the at least one preset division strategy and the at least one preset combination strategy.
The method includes the steps of carrying out parallel division on input data of an ATen first operator by adopting 20 preset division strategies to obtain 10 input data groups corresponding to the 20 preset division strategies respectively, carrying out parallel combination on 10 output data corresponding to the 20 preset division strategies by adopting 5 preset combination strategies to obtain 20 prediction execution results corresponding to the 5 preset combination strategies respectively, determining 3 target prediction execution results consistent with label execution results carried by the ATen first operator in the 100 prediction execution results, determining 3 preset division strategies and 3 preset combination strategies corresponding to the 3 target prediction execution results, determining a target division strategy and a target combination strategy corresponding to an optimization target (communication overhead optimization) based on the 3 preset division strategies and the 3 preset combination strategies, and determining a target distributed strategy of the ATen first operator based on the target division strategies and the target combination strategies.
Dividing the input data of the first operator in parallel by adopting a plurality of preset dividing strategies to obtain a plurality of input data sets corresponding to the preset dividing strategies respectively; adopting a plurality of preset merging strategies to merge a plurality of output data in parallel to obtain prediction execution results corresponding to the preset merging strategies respectively; determining at least one target prediction execution result consistent with the label execution result carried by the first operator in the plurality of prediction execution results; determining at least one preset dividing strategy and at least one preset combining strategy corresponding to at least one target prediction execution result; determining a target partitioning strategy and a target merging strategy based on at least one preset partitioning strategy and at least one preset merging strategy; and determining a target distributed strategy of the first operator based on the target division strategy and the target merging strategy. By parallel division and parallel combination, a plurality of division strategies and a plurality of combination strategies are executed in parallel, comprehensive exploration of the division strategies and the combination strategies is performed, the target division strategies and the target combination strategies are determined according to at least one target prediction execution result consistent with the label execution result carried by the first operator, and exploration of rationality of the target distributed strategy of the operator level is realized.
In an alternative embodiment of the present disclosure, determining the target partitioning policy and the target merging policy based on at least one preset partitioning policy and at least one preset merging policy includes the following specific steps:
and performing linear programming based on at least one preset dividing strategy and at least one preset combining strategy, and determining a target dividing strategy and a target combining strategy.
The linear programming is an SPMD (Single Program Multiple Datas, single program multiple data) distributed strategy exploration algorithm, which in the implementation of this specification is Integer Linear Programming (ILP) or hybrid linear programming (MILP).
Based on at least one preset dividing strategy and at least one preset combining strategy, linear programming is carried out, and a target dividing strategy and a target combining strategy are determined in the following specific modes: and performing linear programming based on at least one preset dividing strategy and at least one preset combining strategy, and determining a target dividing strategy and a target combining strategy corresponding to an optimization target of the linear programming.
The linear programming is performed based on 3 preset division policies and 3 preset combination policies, a target division policy and a target combination policy corresponding to an optimization target (communication overhead optimization) of the linear programming are determined, and a target distributed policy of the ATen first operator is determined based on the target division policy and the target combination policy.
And performing linear programming based on at least one preset dividing strategy and at least one preset combining strategy, and determining a target dividing strategy and a target combining strategy. By means of linear programming, exploration of rationality and optimality of a target distributed strategy of an operator level is achieved.
In an alternative embodiment of the present disclosure, based on at least one preset division policy and at least one preset combination policy, linear programming is performed to determine a target division policy and a target combination policy, including the following specific steps:
and based on at least one preset dividing strategy and at least one preset combining strategy, performing linear programming by taking communication overhead or processing efficiency as an optimization target, and determining a target dividing strategy and a target combining strategy.
The communication overhead is overhead data generated when data communication is performed between a plurality of node devices deploying the first operator. Including but not limited to: network delay, network packet loss rate, and network bandwidth.
The processing efficiency is the efficiency of the logical processing of the first operator, for example FLOPS, MAC, OPS and throughput.
Illustratively, the target partitioning strategy and the target merging strategy are determined based on 3 preset partitioning strategies and 3 preset merging strategies, and mixed Integer Linear Programming (ILP) is performed with communication overhead as an optimization target.
And based on at least one preset dividing strategy and at least one preset combining strategy, performing linear programming by taking communication overhead or processing efficiency as an optimization target, and determining a target dividing strategy and a target combining strategy. By means of linear programming, a more accurate exploration of rationality and optimality of the target distributed strategy of the operator hierarchy adapted to communication overhead or processing efficiency is achieved.
In an alternative embodiment of the present disclosure, determining the target partitioning policy and the target merging policy based on at least one preset partitioning policy and at least one preset merging policy includes the following specific steps:
and performing cluster exploration based on at least one preset division strategy and at least one preset combination strategy, and determining a target division strategy and a target combination strategy.
The Beam Search is an SPMD (Single Program Multiple Data, single-program multiple data) distributed policy Search algorithm, and in this embodiment, may be Greedy Search.
The cluster exploration is performed based on 3 preset partition strategies and 3 preset merging strategies, corresponding target partition strategies and target merging strategies are determined, and the target distributed strategy of the ATen first operator is determined based on the target partition strategies and the target merging strategies.
And performing cluster exploration based on at least one preset division strategy and at least one preset combination strategy, and determining a target division strategy and a target combination strategy. By means of cluster exploration, exploration of rationality and optimality of a target distributed strategy of an operator level is achieved.
In an alternative embodiment of the present disclosure, the task to be processed is a model training task;
correspondingly, the step 204 includes the following specific steps:
dividing sample input data of a first operator by adopting a preset division strategy to obtain a plurality of sample input data sets;
correspondingly, step 206 includes the following specific steps:
invoking a first operator to process any sample input data set to obtain predicted output data corresponding to any sample input data set;
correspondingly, step 208 includes the following specific steps:
combining a plurality of prediction output data by adopting a preset combining strategy to obtain a prediction execution result;
correspondingly, step 210 includes the following specific steps:
under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed training strategy of the first operator based on a preset division strategy and a preset combination strategy;
Correspondingly, the target distributed strategy of each operator in the task model is adopted, and the task to be processed is executed by the method comprises the following specific steps:
and training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a task model after training.
The task to be processed is a model training task, and the task processing method in the embodiment of the present disclosure may be considered as a method for training a task model by using a target distributed training strategy of each operator.
Training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a trained task model, wherein the specific mode is as follows: dividing sample input data of each operator in the task model by adopting a target distributed training strategy of each operator in the task model, carrying out distributed training on the divided sample input data, and merging the operators after training to obtain a task model after training.
The method comprises the steps of dividing sample input data of a first operator by adopting 20 preset division strategies to obtain 20×10 sample input data sets, calling an execution code corresponding to the ATen first operator to perform corresponding logic processing on the 20×10 sample input data sets by the ATen first operator to obtain 20×10 prediction output data, adopting 5 preset merging strategies to parallelly merge the 20×10 prediction output data to obtain 20×5 prediction execution results, determining a prediction execution result consistent with a label execution result carried by the ATen first operator from the 20×5 prediction execution results, determining 3 sets of preset division strategies and preset merging strategies, determining a target distributed strategy of the ATen first operator based on the preset division strategies, the preset merging strategies, gradient accumulation strategies, data reuse optimization strategies and communication overhead optimization strategies, adopting a target distributed training strategy of each ATen operator in an inference big model to be trained, dividing sample input data of each ATen in the inference big model to be trained, performing distributed training on the divided sample input data, and carrying out the training on each completed inference big model to obtain an ATen inference big model.
Dividing sample input data of a first operator by adopting a preset division strategy to obtain a plurality of sample input data sets; invoking a first operator to process any sample input data set to obtain predicted output data corresponding to any sample input data set; combining a plurality of prediction output data by adopting a preset combining strategy to obtain a prediction execution result; under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed training strategy of the first operator based on a preset division strategy and a preset combination strategy; and training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a task model after training. The distributed training strategy exploration of each operator is carried out on the operator level of the task model, the target distributed training strategy of each operator is automatically determined in a consistency matching detection mode, dependence on the distributed training strategy of a specific model is eliminated, manual analysis and labeling are not needed, and the universality, expandability and efficiency of model training are improved.
In an alternative embodiment of the present disclosure, the task to be processed is an inference task;
Correspondingly, the step 204 includes the following specific steps:
dividing the reasoning input data of the first operator by adopting a preset division strategy to obtain a plurality of reasoning input data sets;
correspondingly, step 206 includes the following specific steps:
invoking a first operator to process any reasoning input data set to obtain reasoning output data corresponding to any reasoning input data set;
correspondingly, step 208 includes the following specific steps:
combining a plurality of reasoning output data by adopting a preset combining strategy to obtain a reasoning checking result;
correspondingly, step 210 includes the following specific steps:
under the condition that the reasoning verification result is consistent with the label execution result carried by the first operator, determining a target distributed reasoning strategy of the first operator based on a preset division strategy and a preset combination strategy;
correspondingly, the target distributed strategy of each operator in the task model is adopted, and the task to be processed is executed by the method comprises the following specific steps:
and executing an reasoning task by adopting a target distributed reasoning strategy of each operator in the task model to obtain a reasoning result.
The task to be processed is an inference task, and the task processing method in the embodiment of the present disclosure may be considered as a method for performing inference by using a target distributed inference policy of each operator.
The target distributed reasoning strategy of each operator in the task model is adopted to execute the reasoning task, and a reasoning result is obtained by the following specific modes: and executing a distributed reasoning task by adopting a target distributed reasoning strategy of each operator in the task model to obtain a reasoning result.
The method includes the steps of dividing reasoning input data of a first operator by adopting 20 preset division strategies to obtain 20×10 reasoning input data sets, calling an execution code corresponding to the ATen first operator to perform corresponding logic processing on the 20×10 reasoning input data sets by the ATen first operator to obtain 20×10 prediction output data, adopting 5 preset merging strategies to parallelly merge the 20×10 prediction output data to obtain 20×5 reasoning check results, determining a reasoning check result consistent with a label execution result carried by the ATen first operator from the 20×5 reasoning check results, determining 3 sets of corresponding preset division strategies and preset merging strategies, determining a target distributed strategy of the ATen first operator based on the preset division strategies, the preset merging strategies, gradient accumulation strategies, data reuse optimization strategies and communication overhead optimization strategies, adopting target distributed reasoning strategies of the ATen first operator in a reasoning big model to be inferred, dividing the reasoning input data of the ATen first operator to be inferred in the reasoning big model to be inferred, and obtaining the reasoning result after the reasoning.
Dividing sample input data of a first operator by adopting a preset division strategy to obtain a plurality of sample input data sets; invoking a first operator to process any sample input data set to obtain predicted output data corresponding to any sample input data set; combining a plurality of prediction output data by adopting a preset combining strategy to obtain a prediction execution result; under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed training strategy of the first operator based on a preset division strategy and a preset combination strategy; and training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a task model after training. The distributed reasoning strategy exploration of each operator is performed on the operator level of the task model in advance, the target distributed reasoning strategy of each operator is automatically determined in a consistency matching detection mode, dependence on the distributed reasoning strategy of a specific model is eliminated, manual analysis and labeling are not needed, and the universality, expandability and efficiency of reasoning are improved.
Referring to fig. 3, fig. 3 shows a flowchart of another task processing method according to an embodiment of the present disclosure, including the following specific steps:
Step 302: and acquiring a task to be processed.
Step 304: invoking a pre-trained task model to execute a task to be processed, and obtaining a task execution result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy.
The embodiment of the present disclosure and the embodiment of fig. 2 are in the same inventive concept, and specific content refers to the content of the embodiment of fig. 2.
Illustratively, obtaining a question text of a question-and-answer task "how to understand instantiation," invoking a pre-trained question-and-answer model to execute the question text of the question-and-answer task, obtaining a answer text "instantiation" refers to the process of creating one or more objects with particular properties and behaviors. For example, when you want to create a rectangle with a specific size and color, you need to instantiate a rectangular object and assign it the size and color. Instantiation can also be used to create dynamic objects, for example when you want to create a jumping rabbit in a game you need to instantiate a rabbit object and assign actions to it. In summary, instantiation is the process of creating one or more objects with specific properties and behaviors. The method comprises the steps of performing distributed training on a question-answer model by adopting a target distributed strategy of each operator, wherein the target distributed strategy of each operator is a training strategy for realizing efficiency optimization of training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy.
In the embodiment of the specification, a task to be processed is acquired; invoking a pre-trained task model to execute a task to be processed, and obtaining a task execution result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy. The task model is obtained by performing distributed training by determining a target distributed strategy of each operator based on the consistency of a predicted result and a label result, and the target distributed strategy is determined by performing distributed strategy exploration of each operator suitable for optimizing a target on an operator level of the task model, so that dependence on the distributed strategy of a specific model is eliminated, manual analysis and labeling are not needed, the universality, expandability and efficiency of training of the task processing model are improved, and the accuracy and efficiency of task processing are further improved.
Referring to fig. 4, fig. 4 shows a flowchart of still another task processing method provided in an embodiment of the present disclosure, where the method is applied to cloud-side devices, and includes the following specific steps:
Step 402: and receiving a target reasoning task sent by the front end.
Step 404: invoking a task model trained in advance to execute a target reasoning task to obtain a reasoning result, wherein the task model is obtained by carrying out distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for carrying out training on a plurality of node devices by distributing each operator, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset merging strategy.
Step 406: and feeding back the reasoning result to the front end.
The cloud side device is network cloud equipment where a service end of an application of a task processing function is located, and is virtual equipment, and a task model with the task processing function is deployed on the cloud side device. The front end is the front end of the client of the web page, application program or applet of the task processing function logged in by the user. The cloud side equipment is connected with the front end through a network transmission channel to perform data transmission. The computing power performance and the storage performance of the cloud side equipment are higher than those of the front end.
The embodiment of the present disclosure and the embodiment of fig. 2 are in the same inventive concept, and specific content refers to the content of the embodiment of fig. 2.
Illustratively, the text to be inferred that receives the target inference task sent by the front-end "chicken and rabbit are in the same cage, 10 more chickens than rabbits, but 60 fewer feet than rabbits, how many chickens and rabbits are asked? And calling a pre-trained task model to execute a target reasoning task, and obtaining a reasoning result text, wherein x rabbits are arranged, and (x+10) chickens are arranged. According to the questions, the number of feet of the chicken is 60 less than that of the rabbit, so the following equation can be obtained: 4 x-60=2 (x+10); the solution equation can be obtained: 4x-60 = 2x+20;2 x=80; x=40; thus, there were 40 rabbits and 50 chickens. The task model is obtained by carrying out distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is a training strategy for realizing efficiency optimization of training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on consistency of a prediction result and a label result, the prediction result is obtained according to a preset division strategy and a preset combination strategy, and an reasoning result is fed back to the front end.
In the embodiment of the specification, receiving a target reasoning task sent by a front end; invoking a task model trained in advance to execute a target reasoning task to obtain a reasoning result, wherein the task model is obtained by carrying out distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for carrying out training on a plurality of node devices by distributing each operator, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset merging strategy; and feeding back the reasoning result to the front end. The task model is obtained by performing distributed training by determining a target distributed strategy of each operator based on the consistency of a predicted result and a label result, and the target distributed strategy is determined by performing distributed strategy exploration of each operator suitable for optimizing a target on an operator level of the task model, so that dependence on the distributed strategy of a specific model is eliminated, manual analysis and labeling are not needed, the universality, expandability and efficiency of training of the task processing model are improved, the accuracy and efficiency of task processing are further improved, and the efficiency of task processing is further improved through high calculation performance and high storage performance of cloud side equipment.
In an alternative embodiment of the present disclosure, following step 406, the following specific steps are further included:
the receiving front end sends result feedback data, wherein the result feedback data is generated based on an inference result;
based on the result feedback data, a target distributed training strategy of each operator in the task model is adopted to adjust the task model, and an adjusted task model is obtained.
The result feedback data is front-end feedback generated based on the inference results, including but not limited to: understanding bias (failure to understand target inference tasks, failure to find target inference tasks, mishap), result bias (factual error, duplication, logic confusion, format error), or presence of sensitive information (sensitive content, value view bias).
Based on the result feedback data, a target distributed training strategy of each operator in the task model is adopted to adjust the task model, and an adjusted task model is obtained, wherein the specific mode is as follows: and determining the result feedback data as tag result data, and adjusting the task model by adopting a target distributed training strategy of each operator in the task model based on the tag result data to obtain an adjusted task model.
Illustratively, the reasoning result text is set to have x rabbits and (x+10) chickens. According to the questions, the number of feet of the chicken is 60 less than that of the rabbit, so the following equation can be obtained: 4 x-60=2 (x+10); the solution equation can be obtained: 4x-60 = 2x+20;2 x=40; x=20; thus, there were 20 rabbits and 70 chickens. "feedback to front end, receiving node based on the result feedback data that the result text of reasoning generated: there were actual mistakes, 40 rabbits and 50 chickens should be used. And determining the result feedback data as tag result data, and adjusting the task model by adopting a target distributed training strategy of each operator in the task model based on the tag result data to obtain an adjusted task model.
The receiving front end sends result feedback data, wherein the result feedback data is generated based on an inference result; based on the result feedback data, a target distributed training strategy of each operator in the task model is adopted to adjust the task model, and an adjusted task model is obtained. Through interactive adjustment, the task model is readjusted based on a target distributed training strategy of each operator, so that the model performance of the task model is improved, and the task model has high efficiency.
Fig. 5 is a schematic flow chart of a task processing method according to an embodiment of the present disclosure, where the schematic flow chart is shown in fig. 5:
a task model under a plurality of model frameworks is provided, the task model comprising a plurality of operators and execution code for the plurality of operators. And executing codes under the Pytosch model framework, wherein the executing codes comprise a plurality of operators of the task model under the Pytosch model framework and a plurality of operators of the task model under the Pytosch model framework. Jax under the model framework, execution code comprising a plurality of operators of the task model under the Jax model framework and a plurality of operators of the task model under the Jax model framework. There are other model frameworks (e.g., TVM, tensorFlow, etc.). The multiple operators (the multiple operators of the task model under the Pytorch model framework and the multiple operators of the task model under the Jax model framework) are converted into multiple operators in Meta format and input data sets of the multiple operators, and the execution codes of the multiple operators (the execution codes of the multiple operators of the task model under the Pytorch model framework and the execution codes of the multiple operators of the task model under the Jax model framework) are converted into intermediate representations in Meta format by a conversion layer that is independent of the specific model framework. And carrying out automatic distributed strategy exploration in the Meta format based on the operators in the Meta format and the input data sets of the operators. And calling a plurality of operators in the Meta format to process the input data set through the intermediate representation in the Meta format, and determining a target distributed strategy of each operator. And executing the task to be processed by adopting a target distributed strategy.
The task processing method provided in the present specification will be further described with reference to fig. 6 by taking an application of the task processing method to task reasoning as an example. Fig. 6 is a process flow chart of a task processing method applied to reasoning tasks, where the method is applied to a distributed policy exploration plugin, and the plugin includes a model framework conversion component, a distributed policy exploration component, a model framework inverse conversion component and a preset partitioning policy component, and includes the following specific steps:
step 602: and acquiring a task model, wherein the task model comprises a plurality of operators and execution codes of the operators, and the operators carry label execution results.
Step 604: in the model framework conversion component, according to conversion rules corresponding to the Pytorch model framework or the Jax model framework, a plurality of operators of the task model are converted into a plurality of operators MetaOp in a Meta format, and execution codes of the plurality of operators are converted into an intermediate representation MetaIR in the Meta format.
Step 606: in the preset division strategy component, a plurality of preset division strategies are adopted to divide input data of the MetaOp first operator in parallel to obtain a plurality of input data sets corresponding to the preset division strategies respectively, wherein the MetaOp first operator is any one of the MetaOp operators.
Step 608: in the distributed strategy exploration component, metaIR is represented through the middle of a Meta format, a MetaOp first operator is called to process any input data set to obtain output data corresponding to any input data set, a plurality of preset merging strategies are adopted to merge the plurality of output data in parallel to obtain prediction execution results corresponding to the preset merging strategies respectively, at least one target prediction execution result consistent with label execution results carried by the MetaOp first operator in the plurality of prediction execution results is determined, at least one preset division strategy and at least one preset merging strategy corresponding to the at least one target prediction execution result are determined, integer linear programming is conducted based on the at least one preset division strategy and the at least one preset merging strategy by taking communication overhead or processing efficiency as an optimization target, the target division strategy and the target merging strategy are determined, and the target distributed strategy of the MetaOp first operator is determined based on the target division strategy and the target merging strategy.
Step 610: in the model framework reverse conversion component, the target distributed strategy for each MetaOp operator is injected into either the Pytorch model framework or the Jax model framework.
Step 612: and under the Pytorch model framework or Jax model framework, executing the task to be processed by adopting a target distributed strategy of each MetaOp operator in the task model.
In the embodiment of the specification, the operator is described only from the aspects of operator division and result merging, and complete semantic description is not needed. The intermediate representation uses only a single node type, several operator partitioning and result merging methods, and can cover all possible distributed strategies. And obtaining a target distributed strategy of each operator through an automatic exploration algorithm. The distributed strategy exploration of each operator is carried out on the operator level of the task model, the dependence on the distributed strategy of a specific model is eliminated by a consistency matching detection mode, the distributed strategy of each operator target which is most suitable for an optimization target can be automatically determined without manual analysis and marking, the operators are distributed and deployed on a plurality of node devices, the parallel optimization processing of data is realized, and the universality, the expandability and the efficiency of the task processing are improved.
Corresponding to the method embodiment, the present disclosure further provides an embodiment of a task processing device, and fig. 7 shows a schematic structural diagram of the task processing device provided in one embodiment of the present disclosure. As shown in fig. 7, the apparatus includes:
a first obtaining module 702, configured to obtain a task model corresponding to a task to be processed, where the task model includes a plurality of operators, and the operators carry a label execution result;
The first division module 704 is configured to divide input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the plurality of operators;
the first processing module 706 is configured to call the first operator to process any input data set, so as to obtain output data corresponding to any input data set;
a first merging module 708, configured to merge the plurality of output data by adopting a preset merging strategy, so as to obtain a prediction execution result;
the first determining module 710 is configured to determine, based on a preset division policy and a preset merge policy, a target distributed policy of the first operator when the predicted execution result is consistent with the tag execution result carried by the first operator, where the target distributed policy of the first operator is an optimization policy for executing the task to be processed by distributing the first operator over a plurality of node devices.
Optionally, the apparatus further comprises:
the first execution module is configured to execute the task to be processed by adopting a target distributed strategy of each operator in the task model.
Optionally, the task model is a neural network model under the target model framework;
Correspondingly, the first execution module is further configured to:
and applying the target distributed strategy to a target model framework, and executing the task to be processed by adopting the target distributed strategy of each operator in the task model under the target model framework.
Optionally, the first partitioning module 704 is further configured to:
dividing the input data of the first operator in parallel by adopting a plurality of preset dividing strategies to obtain a plurality of input data sets corresponding to the preset dividing strategies respectively;
correspondingly, the first merge module 708 is further configured to:
adopting a plurality of preset merging strategies to merge a plurality of output data in parallel to obtain prediction execution results corresponding to the preset merging strategies respectively;
correspondingly, the first determination module 710 is further configured to:
determining at least one target prediction execution result consistent with the label execution result carried by the first operator in the plurality of prediction execution results; determining at least one preset dividing strategy and at least one preset combining strategy corresponding to at least one target prediction execution result; determining a target partitioning strategy and a target merging strategy based on at least one preset partitioning strategy and at least one preset merging strategy; and determining a target distributed strategy of the first operator based on the target division strategy and the target merging strategy.
Optionally, the first determination module 710 is further configured to:
and performing linear programming based on at least one preset dividing strategy and at least one preset combining strategy, and determining a target dividing strategy and a target combining strategy.
Optionally, the first determination module 710 is further configured to:
and based on at least one preset dividing strategy and at least one preset combining strategy, performing linear programming by taking communication overhead or processing efficiency as an optimization target, and determining a target dividing strategy and a target combining strategy.
Optionally, the first determination module 710 is further configured to:
and performing cluster exploration based on at least one preset division strategy and at least one preset combination strategy, and determining a target division strategy and a target combination strategy.
Optionally, the task to be processed is a model training task;
correspondingly, the first partitioning module 704 is further configured to:
dividing sample input data of a first operator by adopting a preset division strategy to obtain a plurality of sample input data sets;
correspondingly, the first processing module 706 is further configured to:
invoking a first operator to process any sample input data set to obtain predicted output data corresponding to any sample input data set;
Correspondingly, the first merge module 708 is further configured to:
combining a plurality of prediction output data by adopting a preset combining strategy to obtain a prediction execution result;
correspondingly, the first determination module 710 is further configured to:
under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed training strategy of the first operator based on a preset division strategy and a preset combination strategy;
correspondingly, the first execution module is further configured to:
and training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a task model after training.
Optionally, the task to be processed is an inference task;
correspondingly, the first partitioning module 704 is further configured to:
dividing the reasoning input data of the first operator by adopting a preset division strategy to obtain a plurality of reasoning input data sets;
correspondingly, the first processing module 706 is further configured to:
invoking a first operator to process any reasoning input data set to obtain reasoning output data corresponding to any reasoning input data set;
correspondingly, the first merge module 708 is further configured to:
Combining a plurality of reasoning output data by adopting a preset combining strategy to obtain a reasoning checking result;
correspondingly, the first determination module 710 is further configured to:
under the condition that the reasoning verification result is consistent with the label execution result carried by the first operator, determining a target distributed reasoning strategy of the first operator based on a preset division strategy and a preset combination strategy;
correspondingly, the first execution module is further configured to:
and executing an reasoning task by adopting a target distributed reasoning strategy of each operator in the task model to obtain a reasoning result.
In the embodiment of the specification, a task model corresponding to a task to be processed is obtained, wherein the task model comprises a plurality of operators, and the operators carry label execution results; dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the operators; invoking a first operator to process any input data set to obtain output data corresponding to any input data set; combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result; under the condition that the predicted execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on a preset division strategy and a preset combination strategy; and executing the task to be processed by adopting a target distributed strategy of each operator in the task model. The distributed strategy exploration of each operator is carried out on the operator level of the task model, the target distributed strategy of each operator is automatically determined in a consistency matching detection mode, dependence on the distributed strategy of a specific model is eliminated, manual analysis and labeling are not needed, and the universality, expandability and efficiency of task processing are improved.
The above is a schematic solution of a task processing device of the present embodiment. It should be noted that, the technical solution of the task processing device and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the task processing device, which are not described in detail, can be referred to the description of the technical solution of the task processing method.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a task processing device, and fig. 8 shows a schematic structural diagram of another task processing device provided in one embodiment of the present disclosure. As shown in fig. 8, the apparatus includes:
a second acquisition module 802 configured to acquire a task to be processed;
the second execution module 804 is configured to invoke a task model trained in advance to execute a task to be processed, and obtain a task execution result, where the task model is obtained by performing distributed training by using a target distributed policy of each operator, where the target distributed policy of each operator is an optimized training policy for performing training by distributing each operator on a plurality of node devices, and the target distributed policy of each operator is determined based on consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division policy and a preset combination policy.
In the embodiment of the specification, the distributed strategy exploration of each operator is performed on the operator level of the task model, the dependence on the distributed strategy of a specific model is eliminated by a consistency matching detection mode, manual analysis and labeling are not needed, each operator target distributed strategy which is most suitable for an optimization target can be automatically determined, each operator is distributed and deployed on a plurality of node devices, data parallel optimization processing is realized, and the universality, expandability and efficiency of task processing are improved.
The above is another exemplary embodiment of the task processing device of the present embodiment. It should be noted that, the technical solution of the task processing device and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the task processing device, which are not described in detail, can be referred to the description of the technical solution of the task processing method
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a task processing device, and fig. 9 shows a schematic structural diagram of still another task processing device provided in one embodiment of the present disclosure. As shown in fig. 9, the apparatus is applied to cloud-side equipment, and includes:
A third receiving module 902, configured to receive a target inference task sent by the front end;
the third execution module 904 is configured to invoke a task model trained in advance to execute a target reasoning task, and obtain a reasoning result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on the consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy;
a third feedback module 906 configured to feed back the reasoning results to the front-end.
Optionally, the apparatus further comprises:
the feedback adjustment module is configured to receive result feedback data sent by the front end, wherein the result feedback data is generated based on an inference result; based on the result feedback data, a target distributed training strategy of each operator in the task model is adopted to adjust the task model, and an adjusted task model is obtained.
In the embodiment of the specification, the task model is obtained by performing distributed training by determining the target distributed strategy of each operator based on the consistency of the prediction result and the label result, and the target distributed strategy is determined by performing distributed strategy exploration of each operator suitable for optimizing the target on the operator level of the task model, so that the dependence on the distributed strategy of a specific model is eliminated, manual analysis and labeling are not needed, the universality, expandability and efficiency of training of the task processing model are improved, the accuracy and efficiency of task processing are further improved, and the efficiency of task processing is further improved through the high calculation performance and the high storage performance of cloud side equipment.
The above is another exemplary embodiment of the task processing device of the present embodiment. It should be noted that, the technical solution of the task processing device and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the task processing device, which are not described in detail, can be referred to the description of the technical solution of the task processing method
FIG. 10 illustrates a block diagram of a computing device provided in one embodiment of the present description. The components of the computing device 1000 include, but are not limited to, a memory 1010 and a processor 1020. Processor 1020 is coupled to memory 1010 via bus 1030 and database 1050 is used to store data.
Computing device 1000 also includes access device 1040, which access device 1040 enables computing device 1000 to communicate via one or more networks 1060. Examples of such networks include public switched telephone networks (PSTN, public SwitchedTelephone Network), local area networks (LAN, localAreaNetwork), wide area networks (WAN, wideAreaNetwork), personal area networks (PAN, personalArea networks), or combinations of communication networks such as the internet. The access device 1040 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network Interface Controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near field communication (NFC, near Field Communication).
In one embodiment of the present description, the above-described components of computing device 1000, as well as other components not shown in FIG. 10, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 10 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1000 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 1000 may also be a mobile or stationary server.
Wherein the processor 1020 is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the task processing method described above.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the task processing method.
An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the task processing method described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the task processing method.
An embodiment of the present disclosure further provides a computer program, where the computer program, when executed in a computer, causes the computer to perform the steps of the task processing method described above.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the task processing method belong to the same conception, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the task processing method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, randomAccess Memory), an electrical carrier signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (14)

1. A task processing method, comprising:
acquiring a task model corresponding to a task to be processed, wherein the task model comprises a plurality of operators, and the operators carry label execution results;
dividing input data of a first operator by adopting a preset division strategy to obtain a plurality of input data sets, wherein the first operator is any one of the plurality of operators;
invoking the first operator to process any input data set to obtain output data corresponding to any input data set;
combining a plurality of output data by adopting a preset combining strategy to obtain a prediction execution result;
and under the condition that the prediction execution result is consistent with the label execution result carried by the first operator, determining a target distributed strategy of the first operator based on the preset division strategy and the preset combination strategy, wherein the target distributed strategy of the first operator is an optimization strategy for executing the task to be processed by distributing the first operator on a plurality of node devices.
2. The method of claim 1, further comprising, after said obtaining a target distributed policy for each operator in said task model:
And executing the task to be processed by adopting a target distributed strategy of each operator in the task model.
3. The method of claim 2, the task model being a neural network model under a target model framework;
the task to be processed is executed by adopting a target distributed strategy of each operator in the task model, which comprises the following steps:
and applying the target distributed strategy to the target model framework, and executing the task to be processed by adopting the target distributed strategy of each operator in the task model under the target model framework.
4. A method according to any one of claims 1 to 3, wherein the dividing the input data of the first operator by using a preset division policy to obtain a plurality of input data sets includes:
dividing the input data of the first operator in parallel by adopting a plurality of preset dividing strategies to obtain a plurality of input data sets corresponding to the preset dividing strategies respectively;
the step of combining the plurality of output data by adopting a preset combining strategy to obtain a predicted execution result comprises the following steps:
adopting a plurality of preset merging strategies to merge a plurality of output data in parallel to obtain prediction execution results corresponding to the preset merging strategies respectively;
And under the condition that the prediction execution result is consistent with the label execution result carried by the first operator, determining the target distributed strategy of the first operator based on the preset division strategy and the preset merging strategy comprises the following steps:
determining at least one target prediction execution result consistent with the label execution result carried by the first operator in a plurality of prediction execution results;
determining at least one preset dividing strategy and at least one preset combining strategy corresponding to the at least one target prediction execution result;
determining a target partitioning strategy and a target merging strategy based on the at least one preset partitioning strategy and the at least one preset merging strategy;
and determining a target distributed strategy of the first operator based on the target division strategy and the target merging strategy.
5. The method of claim 4, the determining a target partitioning policy and a target merging policy based on the at least one preset partitioning policy and the at least one preset merging policy, comprising:
and performing linear programming based on the at least one preset dividing strategy and the at least one preset combining strategy, and determining a target dividing strategy and a target combining strategy.
6. The method of claim 5, wherein the determining the target partitioning strategy and the target merging strategy based on the at least one preset partitioning strategy and the at least one preset merging strategy by linear programming comprises:
and based on the at least one preset dividing strategy and the at least one preset combining strategy, performing linear programming by taking communication overhead or processing efficiency as an optimization target, and determining a target dividing strategy and a target combining strategy.
7. The method of claim 4, the determining a target partitioning policy and a target merging policy based on the at least one preset partitioning policy and the at least one preset merging policy, comprising:
and performing cluster exploration based on the at least one preset division strategy and the at least one preset combination strategy, and determining a target division strategy and a target combination strategy.
8. The method of claim 2, the task to be processed being a model training task;
the dividing the input data of the first operator by adopting a preset dividing strategy to obtain a plurality of input data sets, including:
dividing sample input data of a first operator by adopting a preset division strategy to obtain a plurality of sample input data sets;
The step of calling the first operator to process any input data set to obtain output data corresponding to any input data set comprises the following steps:
invoking the first operator to process any sample input data set to obtain predicted output data corresponding to any sample input data set;
the step of combining the plurality of output data by adopting a preset combining strategy to obtain a predicted execution result comprises the following steps:
combining a plurality of prediction output data by adopting a preset combining strategy to obtain a prediction execution result;
and under the condition that the prediction execution result is consistent with the label execution result carried by the first operator, determining the target distributed strategy of the first operator based on the preset division strategy and the preset merging strategy comprises the following steps:
determining a target distributed training strategy of the first operator based on the preset dividing strategy and the preset combining strategy under the condition that the prediction execution result is consistent with the label execution result carried by the first operator;
the task to be processed is executed by adopting a target distributed strategy of each operator in the task model, which comprises the following steps:
and training the task model by adopting a target distributed training strategy of each operator in the task model to obtain a task model after training.
9. The method of claim 2, the task to be processed being an inference task;
the dividing the input data of the first operator by adopting a preset dividing strategy to obtain a plurality of input data sets, including:
dividing the reasoning input data of the first operator by adopting a preset division strategy to obtain a plurality of reasoning input data sets;
the step of calling the first operator to process any input data set to obtain output data corresponding to any input data set comprises the following steps:
invoking the first operator to process any reasoning input data set to obtain reasoning output data corresponding to the any reasoning input data set;
the step of combining the plurality of output data by adopting a preset combining strategy to obtain a predicted execution result comprises the following steps:
combining a plurality of reasoning output data by adopting a preset combining strategy to obtain a reasoning checking result;
and under the condition that the prediction execution result is consistent with the label execution result carried by the first operator, determining the target distributed strategy of the first operator based on the preset division strategy and the preset merging strategy comprises the following steps:
determining a target distributed reasoning strategy of the first operator based on the preset partitioning strategy and the preset merging strategy under the condition that the reasoning checking result is consistent with the label executing result carried by the first operator;
The task to be processed is executed by adopting a target distributed strategy of each operator in the task model, which comprises the following steps:
and executing the reasoning task by adopting a target distributed reasoning strategy of each operator in the task model to obtain a reasoning result.
10. A task processing method, comprising:
acquiring a task to be processed;
invoking a pre-trained task model to execute the task to be processed to obtain a task execution result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy.
11. The task processing method is applied to cloud side equipment and comprises the following steps:
receiving a target reasoning task sent by a front end;
invoking a pre-trained task model to execute the target reasoning task to obtain a reasoning result, wherein the task model is obtained by performing distributed training by adopting a target distributed strategy of each operator, the target distributed strategy of each operator is an optimized training strategy for performing training by distributing each operator on a plurality of node devices, the target distributed strategy of each operator is determined based on consistency of a prediction result and a label result, and the prediction result is obtained according to a preset division strategy and a preset combination strategy;
And feeding back the reasoning result to the front end.
12. The method of claim 11, further comprising, after said feeding back the inference results to the front-end:
receiving result feedback data sent by the front end, wherein the result feedback data is generated based on the reasoning result;
and based on the result feedback data, adjusting the task model by adopting a target distributed training strategy of each operator in the task model to obtain an adjusted task model.
13. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the method of any one of claims 1 to 12.
14. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method of any one of claims 1 to 12.
CN202311114706.6A 2023-08-30 2023-08-30 Task processing method, computing device and computer readable storage medium Pending CN117608811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311114706.6A CN117608811A (en) 2023-08-30 2023-08-30 Task processing method, computing device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311114706.6A CN117608811A (en) 2023-08-30 2023-08-30 Task processing method, computing device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117608811A true CN117608811A (en) 2024-02-27

Family

ID=89943000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311114706.6A Pending CN117608811A (en) 2023-08-30 2023-08-30 Task processing method, computing device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117608811A (en)

Similar Documents

Publication Publication Date Title
US11842172B2 (en) Graphical user interface to an artificial intelligence engine utilized to generate one or more trained artificial intelligence models
EP4116885A1 (en) Processing method for neural network model, and related device
KR102039397B1 (en) Visual Question Answering Apparatus for Explaining Reasoning Process and Method Thereof
CN111768004A (en) Model self-adaption method and system based on intelligent computing framework
CN114510570A (en) Intention classification method and device based on small sample corpus and computer equipment
Deng et al. A distributed PDP model based on spectral clustering for improving evaluation performance
Jessup et al. Performance-based numerical solver selection in the Lighthouse framework
CN114490949A (en) Document retrieval method, device, equipment and medium based on BM25 algorithm
JP2023179657A (en) Neural-symbolic computing
CN116578423B (en) Task processing method, automatic question answering method and image generation method
CN114746868A (en) Method and apparatus for compiling neural network model
CN117608811A (en) Task processing method, computing device and computer readable storage medium
US20230169147A1 (en) Validation processing for candidate retraining data
US20220292393A1 (en) Utilizing machine learning models to generate initiative plans
Chowdhury et al. Qsfvqa: A time efficient, scalable and optimized vqa framework
Lupión et al. Accelerating neural network architecture search using multi-GPU high-performance computing
Alizadehsani et al. Deep Learning-Based Code Auto-Completion for Distributed Applications
Dahaoui et al. Distributed training from multi-sourced data
Sun et al. Online programming education modeling and knowledge tracing
Kumar et al. Using ensemble learning libraries
CN116737964B (en) Artificial intelligence brain system
Alonso-Moro et al. Deep Learning-Based Code Auto-Completion for Distributed Applications
Cárdenas et al. Tips and tools to automate OMNeT++ simulations and to facilitate post data management tasks
Rivera et al. Training an image classifier with transfer learning on Node. js
Sarang et al. Estimators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination