CN117271113A - Task execution method, device, electronic equipment and storage medium - Google Patents
Task execution method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN117271113A CN117271113A CN202311101089.6A CN202311101089A CN117271113A CN 117271113 A CN117271113 A CN 117271113A CN 202311101089 A CN202311101089 A CN 202311101089A CN 117271113 A CN117271113 A CN 117271113A
- Authority
- CN
- China
- Prior art keywords
- sub
- data
- simulation
- target
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004088 simulation Methods 0.000 claims abstract description 126
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000013136 deep learning model Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 42
- 230000004044 response Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 12
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000007667 floating Methods 0.000 description 10
- 238000009825 accumulation Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure provides a task execution method, relates to the technical field of artificial intelligence, and particularly relates to the technical field of large models and the technical field of deep learning. The specific implementation scheme is as follows: determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource; responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and executing the target task by using the second hardware resource to process the input data to obtain target output data of the target sub-model. The disclosure also provides a task execution device, an electronic device and a storage medium.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of large model technology and deep learning technology. More particularly, the present disclosure provides a task execution method, apparatus, electronic device, and storage medium.
Background
With the development of artificial intelligence technology, large models are increasingly widely used in the field of deep learning. Large models can be trained or inferred using data of varying accuracy.
Disclosure of Invention
The disclosure provides a task execution method, a device, equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a task execution method, including: determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource; responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and executing the target task by using the second hardware resource to process the input data to obtain target output data of the target sub-model.
According to another aspect of the present disclosure, there is provided a task performing device including: the first determining module is used for determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by the first hardware resource; the second determining module is used for determining a second hardware resource according to the simulation output data in response to determining that the simulation output data is not in a first numerical range corresponding to the first hardware resource, wherein the second numerical range corresponding to the second hardware resource is larger than the first numerical range; and the first execution module is used for executing the target task by utilizing the second hardware resource so as to process the input data and obtain target output data of the target sub-model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a task execution method according to one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a deep learning model according to one embodiment of the present disclosure;
FIG. 3 is a block diagram of a task execution device according to one embodiment of the present disclosure; and
fig. 4 is a block diagram of an electronic device to which a task execution method may be applied according to one embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As model depth and breadth increase, accuracy verification and accuracy alignment of large models face significant challenges. When accuracy overflow problems occur in the large model calculation process, debugging of the model becomes extremely difficult.
In some embodiments, after the operator calculation of the model is completed, the output result of the operator may be accurately detected to determine whether the output result exceeds the numerical range of the corresponding data type. For example, a scaling (Scale) operator includes a multiplication operation and an addition operation. The data type to which the scaling operator corresponds may be a 16-bit floating point number (FP 16). The output result of the scaling operator can be detected with precision. If the detection result indicates that the precision of the output result is not within the numerical range corresponding to the 16-bit floating point number, it can be determined that the precision overflow problem occurs, and it is difficult to determine whether the precision overflow is caused by the multiplication operation or the precision overflow is caused by the addition operation, thereby making it difficult to efficiently optimize the execution of the related task.
In order to efficiently perform tasks related to a model, the present disclosure provides a task performing method, which will be further described below.
FIG. 1 is a flow chart of a task execution method according to one embodiment of the present disclosure.
As shown in fig. 1, the method 100 may include operations S110 to S130.
In operation S110, simulation output data of the target sub-model is determined according to input data of the target sub-model in the deep learning model.
In embodiments of the present disclosure, the target sub-model may include at least one target operator. For example, the target operator may be an operator that may suffer from accuracy overflow problems. The target operator may include an operator such as multiply, accumulate, etc. Operators such as accumulation multiplication and accumulation with dependency can be used as target submodels. It will be appreciated that the first operator and the second operator have a dependency relationship if the output of the first operator is the input of the second operator.
In an embodiment of the disclosure, a target task corresponding to a target sub-model may be performed by a first hardware resource. For example, the first hardware resource may include at least one processor core and a corresponding memory space.
In the disclosed embodiments, the simulated output data of the target submodel may be determined in various ways. For example, historical input data and historical output data for the target submodel may be obtained. If the difference between the input data of the target sub-model and a history input data is small, the history output data corresponding to the history input data can be used as simulation output data. It will be appreciated that prior tasks of the target task may be performed using different hardware resources to process multiple historical input data to obtain multiple historical output data prior to processing the input data of the target sub-model.
In operation S120, in response to determining that the simulated output data is not in the first numerical range corresponding to the first hardware resource, a second hardware resource is determined from the simulated output data.
In the disclosed embodiment, the hardware resource corresponds to a range of values. For example, a value range corresponding to a 16-bit floating point number may be used as the hardware resource corresponding value range. The value range corresponding to the 32-bit floating point number (FP 32) can also be used as the hardware resource corresponding value range. The hardware resources required by the hardware device to process 32-bit floating point numbers may be greater than the hardware resources required by the hardware device to process 16-bit floating point numbers.
In the embodiment of the disclosure, the second numerical range corresponding to the second hardware resource is larger than the first numerical range. For example, the first hardware resource may correspond to a range of values for 16-bit floating point numbers. The second hardware resource may correspond to a range of values for 32-bit floating point numbers.
In the embodiment of the disclosure, the second hardware resource may be determined according to a numerical range in which the simulation output data is located.
In operation S130, a target task is performed using the second hardware resource to process the input data to obtain target output data of the target sub-model.
In the embodiment of the disclosure, the input data may be input into the target submodel to obtain target output data. For example, the number of processor cores corresponding to the second hardware resource may be greater than the number of processor cores corresponding to the first hardware resource.
By the embodiment of the disclosure, hardware resources required by the target submodel can be dynamically configured. Therefore, the hardware resources of the hardware equipment can be fully utilized, and the utilization rate of the hardware resources is improved under the condition that the stable operation of the model is ensured.
In addition, through the embodiment of the disclosure, the calculation requirements of different precision can be met, the application scene of the model is expanded, and the mixed precision calculation is facilitated.
It will be appreciated that while the method of the present disclosure is described above, the deep learning model and hardware devices of the present disclosure will be further described below.
In some embodiments, the hardware device may be a cluster of hardware devices.
In some embodiments, the deep learning model may include a plurality of sub-models. Further description will be provided below in connection with fig. 2.
FIG. 2 is a schematic diagram of a deep learning model according to one embodiment of the present disclosure.
As shown in fig. 2, the deep-learning model 200 may include a first sub-model 210, a second sub-model 220, and a target sub-model 230.
In the embodiments of the present disclosure, a hardware device may be divided into a plurality of hardware channels. The plurality of hardware channels corresponds to a plurality of hardware resources. For example, the object operators of the object sub-model may suffer from accuracy overflow problems. As shown in fig. 2, the target submodel 230 may include a cumulative operator 231 and a cumulative operator 232. For another example, the hardware channels may include dynamic hardware channels. The target tasks corresponding to the target sub-model 230 may be run by the hardware resources corresponding to the dynamic hardware channel. The dynamic hardware channel may correspond to a plurality of hardware resources. Multiple hardware resources may correspond to multiple accuracies. The plurality of hardware resources may include a first hardware resource. The second hardware resource may also be determined from a plurality of hardware resources.
For example, the first sub-model 210 may be run by a fourth hardware resource and the second sub-model 220 may be run by a fifth hardware resource. Operators in the first sub-model 210 or the second sub-model 220 are difficult to experience precision overflow problems. The first sub-model 210 may include an addition operator 211 and an addition operator 212. The second sub-model 220 may include an multiplier 221 and an multiplier 222. In the deep learning model operation process, the first sub model and the second sub model can be continuously operated by utilizing the fourth hardware resource and the fifth hardware resource respectively.
It will be appreciated that while the hardware devices and deep learning models of the present disclosure are described above, the objective tasks of the present disclosure will be further described below.
In some embodiments, the target task may include one of a forward computing task and a reverse computing task. Accordingly, the input data may include one of forward input data and reverse input data. For example, in the case where the input data is forward input data, the output data may be a forward calculation result. In another example, in the case where the input data is reverse input data, the output data may be gradient data.
It will be appreciated that while the foregoing is illustrative of the subject matter of the present disclosure, the simulation model of the present disclosure will be described.
In some embodiments, determining simulated output data for the target submodel includes: and processing the input data by using a simulation model corresponding to the target sub-model to obtain simulation output data.
In an embodiment of the present disclosure, the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target submodel. For example, a plurality of historical input data and a corresponding plurality of historical output data for a dynamic hardware channel may be obtained. And performing linear regression according to the plurality of historical input data and the plurality of historical output data to obtain a simulation model. It will be appreciated that the simulation model may also be derived from other models. For example, a fully connected network (Fully Connected Network, FCN) is trained using historical input data as training samples, historical output data as labels, as simulation models. It will also be appreciated that in the case of consistent data types of input data, the hardware resources required for a fully connected network or linear regression model may be less than those required for multiple target operators.
In the embodiment of the disclosure, a simulation task corresponding to a simulation model may be executed by using a preset hardware resource. For example, the preset hardware resources may be determined based on historical output data.
It will be appreciated that while the simulation model of the present disclosure is described above, the simulation output data of the present disclosure will be further described below.
In some embodiments, the at least one target operator may be N target operators. N may be an integer greater than 1. The simulation model may include N simulation sub-models, which may correspond to one target operator. As shown in fig. 2, the N target operators may include the above-described accumulation operator 231 and the above-described accumulation operator 232. The N simulation sub-models may include a simulation sub-model corresponding to the accumulation operator 231, and may also include a simulation sub-model corresponding to the accumulation operator 232.
In some embodiments, determining simulated output data for the target submodel may comprise: and processing the input data by using the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, wherein the N simulation output sub-data is used as simulation output data.
In the embodiment of the present disclosure, processing input data using N simulation sub-models to obtain N simulation output sub-data corresponding to N target operators includes: and processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data. For example, the input data may be processed using a simulation sub-model corresponding to the accumulation operator 231, resulting in the 1 st simulation output sub-data.
In the embodiment of the present disclosure, processing input data using N simulation sub-models to obtain N simulation output sub-data corresponding to N target operators may further include: and processing the n-1 th simulation output sub-data by using the n-th simulation sub-model to obtain the n-th simulation output sub-data. N may be an integer greater than 1 and less than or equal to N. Taking n=2 as an example, the simulation sub-model corresponding to the accumulation operator 232 may process the 1 st simulation output sub-data to obtain the 2 nd simulation output sub-data. The 1 st simulation output sub data and the 2 nd simulation output sub data may be used as simulation output data.
In an embodiment of the present disclosure, it may be determined whether the N simulated output sub-data is within a first numerical range corresponding to the first hardware resource. For example, taking the numerical range of the 16-bit floating point number corresponding to the first hardware resource as an example, if the 1 st simulation output sub-data exceeds the numerical range of the 16-bit floating point number, the second hardware resource can be determined from the plurality of hardware resources corresponding to the dynamic hardware channel according to the 1 st simulation output sub-data. For another example, if it is determined that any of the simulated output sub-data is within the first numerical range, the target task may be performed using the first hardware resource. Therefore, the resource expense can be reduced, and the utilization rate of hardware resources can be improved.
It will be appreciated that after determining the plurality of simulated output sub-data, it is determined whether the simulated output sub-data is within the first range of values. However, the present disclosure is not limited thereto, and it may be determined whether or not the simulation output sub-data is within the first numerical range after one simulation output sub-data is determined.
In an embodiment of the present disclosure, in response to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining, from the simulated output data, the second hardware resource includes: in response to determining that the 1 st simulation output sub-data is not within the first range of values, a second hardware resource is determined from the 1 st simulation output sub-data. For example, after the 1 st simulation output sub-data is obtained, it may be determined whether the 1 st simulation output sub-data is within a first range of values. If the 1 st simulation output sub-data is not within the first numerical range, the second hardware resource may be determined according to the 1 st simulation output sub-data. And the processing by using the subsequent simulation sub-model can be stopped, so that the computing resources are saved, and the resource cost is reduced.
In an embodiment of the present disclosure, in response to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining, from the simulated output data, the second hardware resource includes: in response to determining that the nth simulated output sub-data is not within the first range of values, a second hardware resource is determined from the nth simulated output sub-data. For example, in the case where the first n-1 simulation output sub-data is within the first numerical range, after the nth simulation output sub-data is obtained, it may be determined whether the nth simulation output sub-data is within the first numerical range. If the nth simulation output sub-data is not within the first numerical range, the second hardware resource may be determined according to the nth simulation output sub-data. And under the condition that N is smaller than N, the processing by using the following simulation submodel can be stopped, so that the computing resources are saved, and the resource cost is reduced.
It will be appreciated that some of the ways in which the second hardware resource is determined are described above. In the disclosed embodiments, in order to obtain simulated output data quickly and at low cost, input data is processed using a simulation model. However, the simulation model and the target sub-model may be calculated in different ways, and there may be a large difference between the simulated output data and the actual output data, as will be further described below.
In some embodiments, the above method may further comprise: and in response to determining that the simulated output data is in the first numerical range, performing a target task with the first hardware resource to process the input data to obtain first output data. For example, if the simulated output data is within the first numerical range, the target task may be performed using the first hardware resource to obtain the first output data. It will be appreciated that in this case, the first output data may also be the target output data. Next, it may be determined whether a difference between the first output data and the simulated output data is greater than or equal to a preset difference threshold.
In some embodiments, the above method may further comprise: and in response to determining that the difference between the first output data and the simulated output data is greater than or equal to a preset difference threshold, adjusting the first hardware resource to a third hardware resource. The third value range corresponding to the third hardware resource is greater than the first value range. For example, if the difference is greater than or equal to the preset difference threshold, it may be determined that the error of the simulation model is large. For stable execution of the subsequent task of the target task, it is possible to enhance the use of hardware resources corresponding to the data type of higher accuracy to execute the subsequent task. By the embodiment of the disclosure, hardware resources can be timely adjusted, the risk of precision overflow is reduced, dynamic precision management can be realized, and various computing requirements are met.
For another example, if the difference is less than the preset difference threshold, it may be determined that the error of the simulation model is small, and the subsequent task may be performed using the first hardware resource.
Fig. 3 is a block diagram of a task execution device according to one embodiment of the present disclosure.
As shown in fig. 3, the apparatus 300 may include a first determination module 310, a second determination module 320, and a first execution module 330.
The first determining module 310 is configured to determine simulated output data of the target sub-model according to input data of the target sub-model in the deep learning model. The target sub-model includes at least one target operator. And executing the target task corresponding to the target sub-model by the first hardware resource.
The second determining module 320 is configured to determine, according to the simulation output data, a second hardware resource in response to determining that the simulation output data is not in the first numerical range corresponding to the first hardware resource. The second range of values corresponding to the second hardware resource is greater than the first range of values.
The first execution module 330 is configured to execute the target task by using the second hardware resource, so as to process the input data and obtain target output data of the target sub-model.
In some embodiments, the first determination module comprises: and the processing sub-module is used for processing the input data by using the simulation model corresponding to the target sub-model to obtain simulation output data.
In some embodiments, the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.
In some embodiments, the at least one target operator is N, N being an integer greater than 1, the simulation model comprising N simulation sub-models, the simulation sub-model corresponding to one target operator.
In some embodiments, the processing submodule includes: the processing unit is used for processing the input data by utilizing the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, and the N simulation output sub-data are used as simulation output data.
In some embodiments, the processing unit comprises: and the first processing subunit is used for processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data. And the second processing subunit is used for processing the (n-1) th simulation output sub-data by using the (n) th simulation sub-model to obtain the (n) th simulation output sub-data. N is an integer greater than 1 and less than or equal to N.
In some embodiments, the second determination module is further to: in response to determining that the nth simulated output sub-data is not within the first range of values, a second hardware resource is determined from the nth simulated output sub-data.
In some embodiments, the apparatus 300 further comprises: and the second execution module is used for executing a target task by utilizing the first hardware resource to process the input data to obtain first output data in response to determining that the simulation output data is in the first numerical range.
In some embodiments, the apparatus 300 further comprises: and the adjusting module is used for adjusting the first hardware resource to be a third hardware resource in response to determining that the difference between the first output data and the simulation output data is greater than or equal to a preset difference threshold value. The third value range corresponding to the third hardware resource is greater than the first value range.
In some embodiments, the input data includes one of forward input data and reverse input data.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, for example, a task execution method. For example, in some embodiments, the task execution method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the task execution method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the task execution method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) display or an LCD (liquid crystal display)) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (20)
1. A method of task execution, comprising:
determining simulation output data of a target sub-model according to input data of the target sub-model in a deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource;
responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein a second numerical range corresponding to the second hardware resource is greater than the first numerical range; and
and executing the target task by using the second hardware resource to process the input data so as to obtain target output data of the target sub-model.
2. The method of claim 1, wherein the determining simulated output data of the target submodel comprises:
and processing the input data by using a simulation model corresponding to the target sub-model to obtain the simulation output data.
3. The method of claim 2, wherein the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.
4. The method of claim 2, wherein at least one of the target operators is N, N is an integer greater than 1, the simulation model includes N simulation sub-models, the simulation sub-models corresponding to one of the target operators,
the processing the input data by using a simulation model corresponding to the target sub-model, and obtaining the simulation output data comprises the following steps:
and processing input data by using the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, wherein the N simulation output sub-data is used as the simulation output data.
5. The method of claim 4, wherein the processing the input data with the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators comprises:
processing the input data by using the 1 st simulation sub-model to obtain 1 st simulation output sub-data;
and processing the N-1 simulation output sub-data by using the N-th simulation sub-model to obtain the N-th simulation output sub-data, wherein N is an integer greater than 1 and less than or equal to N.
6. The method of claim 5, wherein the determining, in response to determining that the simulated output data is not in the first range of values corresponding to the first hardware resource, a second hardware resource from the simulated output data comprises:
responsive to determining that the nth of the simulated output sub-data is not within the first range of values, the second hardware resource is determined from the nth of the simulated output sub-data.
7. The method of claim 1, further comprising:
and in response to determining that the simulated output data is in the first numerical range, performing the target task by using the first hardware resource to process the input data to obtain first output data.
8. The method of claim 7, further comprising:
and in response to determining that the difference between the first output data and the simulation output data is greater than or equal to a preset difference threshold, adjusting the first hardware resource to be a third hardware resource, wherein a third numerical range corresponding to the third hardware resource is greater than the first numerical range.
9. The method of claim 1, wherein the input data comprises one of forward input data and reverse input data.
10. A task execution device comprising:
the first determining module is used for determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource;
a second determining module, configured to determine, according to the simulation output data, a second hardware resource in response to determining that the simulation output data is not in a first numerical range corresponding to the first hardware resource, where the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and
and the first execution module is used for executing the target task by utilizing the second hardware resource so as to process the input data and obtain target output data of the target sub-model.
11. The apparatus of claim 10, wherein the first determination module comprises:
and the processing sub-module is used for processing the input data by utilizing a simulation model corresponding to the target sub-model to obtain the simulation output data.
12. The apparatus of claim 11, wherein the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.
13. The apparatus of claim 11, wherein at least one of the target operators is N, N is an integer greater than 1, the simulation model includes N simulation sub-models, the simulation sub-models corresponding to one of the target operators,
the processing sub-module comprises:
and the processing unit is used for processing the input data by utilizing the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, and the N simulation output sub-data are used as the simulation output data.
14. The apparatus of claim 13, wherein the processing unit comprises:
the first processing subunit is used for processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data;
and the second processing subunit is used for processing the (N-1) th simulation output sub-data by utilizing the (N) th simulation sub-model to obtain the (N) th simulation output sub-data, wherein N is an integer which is more than 1 and less than or equal to N.
15. The apparatus of claim 14, wherein the second determination module is further configured to:
responsive to determining that the nth of the simulated output sub-data is not within the first range of values, the second hardware resource is determined from the nth of the simulated output sub-data.
16. The apparatus of claim 10, further comprising:
and the second execution module is used for executing the target task by utilizing the first hardware resource to process the input data to obtain first output data in response to determining that the simulation output data is in the first numerical range.
17. The apparatus of claim 16, further comprising:
and the adjusting module is used for adjusting the first hardware resource into a third hardware resource in response to the fact that the difference between the first output data and the simulation output data is larger than or equal to a preset difference threshold, wherein a third numerical range corresponding to the third hardware resource is larger than the first numerical range.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.
19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 9.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311101089.6A CN117271113A (en) | 2023-08-29 | 2023-08-29 | Task execution method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311101089.6A CN117271113A (en) | 2023-08-29 | 2023-08-29 | Task execution method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117271113A true CN117271113A (en) | 2023-12-22 |
Family
ID=89201755
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311101089.6A Pending CN117271113A (en) | 2023-08-29 | 2023-08-29 | Task execution method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117271113A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948079A (en) * | 2021-02-18 | 2021-06-11 | 北京百度网讯科技有限公司 | Task scheduling method, device, equipment and computer storage medium |
CN113987938A (en) * | 2021-10-27 | 2022-01-28 | 北京百度网讯科技有限公司 | Process parameter optimization method, device, equipment and storage medium |
CN114997401A (en) * | 2022-08-03 | 2022-09-02 | 腾讯科技(深圳)有限公司 | Adaptive inference acceleration method, apparatus, computer device and storage medium |
US20220374713A1 (en) * | 2021-10-28 | 2022-11-24 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus for performing distributed training on deep learning model, device and storage medium |
-
2023
- 2023-08-29 CN CN202311101089.6A patent/CN117271113A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948079A (en) * | 2021-02-18 | 2021-06-11 | 北京百度网讯科技有限公司 | Task scheduling method, device, equipment and computer storage medium |
CN113987938A (en) * | 2021-10-27 | 2022-01-28 | 北京百度网讯科技有限公司 | Process parameter optimization method, device, equipment and storage medium |
US20220374713A1 (en) * | 2021-10-28 | 2022-11-24 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus for performing distributed training on deep learning model, device and storage medium |
CN114997401A (en) * | 2022-08-03 | 2022-09-02 | 腾讯科技(深圳)有限公司 | Adaptive inference acceleration method, apparatus, computer device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113870334B (en) | Depth detection method, device, equipment and storage medium | |
CN113705628B (en) | Determination method and device of pre-training model, electronic equipment and storage medium | |
CN112862017B (en) | Point cloud data labeling method, device, equipment and medium | |
CN114463551A (en) | Image processing method, image processing device, storage medium and electronic equipment | |
CN117993478A (en) | Model training method and device based on bidirectional knowledge distillation and federal learning | |
CN117273069A (en) | Reasoning method, device, equipment and medium based on neural network model | |
JP2023103419A (en) | Operation method, device, chip, electronic equipment, and storage medium | |
US20220113943A1 (en) | Method for multiply-add operations for neural network | |
CN115081630A (en) | Training method of multi-task model, information recommendation method, device and equipment | |
CN117271113A (en) | Task execution method, device, electronic equipment and storage medium | |
CN113361621A (en) | Method and apparatus for training a model | |
CN113012682A (en) | False wake-up rate determination method, device, apparatus, storage medium, and program product | |
CN113361719A (en) | Incremental learning method based on image processing model and image processing method | |
CN114816758B (en) | Resource allocation method and device | |
CN115495312B (en) | Service request processing method and device | |
CN113407844B (en) | Version recommendation method, device and equipment of applet framework and storage medium | |
CN115860077B (en) | Method, device, equipment and storage medium for processing state data | |
CN116629810B (en) | Operation recommendation method, device, equipment and medium based on building office system | |
CN116416500B (en) | Image recognition model training method, image recognition device and electronic equipment | |
CN114492816B (en) | Quantum state processing method, computing device, computing apparatus, and storage medium | |
CN113011494B (en) | Feature processing method, device, equipment and storage medium | |
CN115878783A (en) | Text processing method, deep learning model training method and sample generation method | |
CN116523051A (en) | Model mixed-precision reasoning method, device, equipment and storage medium | |
CN118051264A (en) | Matrix processing method and device, electronic equipment and storage medium | |
CN116090472A (en) | Training method of text processing model, method and device for determining semantic information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |