CN117271113A

CN117271113A - Task execution method, device, electronic equipment and storage medium

Info

Publication number: CN117271113A
Application number: CN202311101089.6A
Authority: CN
Inventors: 牛丽玲; 张婷; 刘益群; 蓝翔; 于佃海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-12-22

Abstract

The disclosure provides a task execution method, relates to the technical field of artificial intelligence, and particularly relates to the technical field of large models and the technical field of deep learning. The specific implementation scheme is as follows: determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource; responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and executing the target task by using the second hardware resource to process the input data to obtain target output data of the target sub-model. The disclosure also provides a task execution device, an electronic device and a storage medium.

Description

Task execution method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of large model technology and deep learning technology. More particularly, the present disclosure provides a task execution method, apparatus, electronic device, and storage medium.

Background

With the development of artificial intelligence technology, large models are increasingly widely used in the field of deep learning. Large models can be trained or inferred using data of varying accuracy.

Disclosure of Invention

The disclosure provides a task execution method, a device, equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a task execution method, including: determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource; responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and executing the target task by using the second hardware resource to process the input data to obtain target output data of the target sub-model.

According to another aspect of the present disclosure, there is provided a task performing device including: the first determining module is used for determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by the first hardware resource; the second determining module is used for determining a second hardware resource according to the simulation output data in response to determining that the simulation output data is not in a first numerical range corresponding to the first hardware resource, wherein the second numerical range corresponding to the second hardware resource is larger than the first numerical range; and the first execution module is used for executing the target task by utilizing the second hardware resource so as to process the input data and obtain target output data of the target sub-model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a task execution method according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a deep learning model according to one embodiment of the present disclosure;

FIG. 3 is a block diagram of a task execution device according to one embodiment of the present disclosure; and

fig. 4 is a block diagram of an electronic device to which a task execution method may be applied according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As model depth and breadth increase, accuracy verification and accuracy alignment of large models face significant challenges. When accuracy overflow problems occur in the large model calculation process, debugging of the model becomes extremely difficult.

In some embodiments, after the operator calculation of the model is completed, the output result of the operator may be accurately detected to determine whether the output result exceeds the numerical range of the corresponding data type. For example, a scaling (Scale) operator includes a multiplication operation and an addition operation. The data type to which the scaling operator corresponds may be a 16-bit floating point number (FP 16). The output result of the scaling operator can be detected with precision. If the detection result indicates that the precision of the output result is not within the numerical range corresponding to the 16-bit floating point number, it can be determined that the precision overflow problem occurs, and it is difficult to determine whether the precision overflow is caused by the multiplication operation or the precision overflow is caused by the addition operation, thereby making it difficult to efficiently optimize the execution of the related task.

In order to efficiently perform tasks related to a model, the present disclosure provides a task performing method, which will be further described below.

FIG. 1 is a flow chart of a task execution method according to one embodiment of the present disclosure.

As shown in fig. 1, the method 100 may include operations S110 to S130.

In operation S110, simulation output data of the target sub-model is determined according to input data of the target sub-model in the deep learning model.

In embodiments of the present disclosure, the target sub-model may include at least one target operator. For example, the target operator may be an operator that may suffer from accuracy overflow problems. The target operator may include an operator such as multiply, accumulate, etc. Operators such as accumulation multiplication and accumulation with dependency can be used as target submodels. It will be appreciated that the first operator and the second operator have a dependency relationship if the output of the first operator is the input of the second operator.

In an embodiment of the disclosure, a target task corresponding to a target sub-model may be performed by a first hardware resource. For example, the first hardware resource may include at least one processor core and a corresponding memory space.

In the disclosed embodiments, the simulated output data of the target submodel may be determined in various ways. For example, historical input data and historical output data for the target submodel may be obtained. If the difference between the input data of the target sub-model and a history input data is small, the history output data corresponding to the history input data can be used as simulation output data. It will be appreciated that prior tasks of the target task may be performed using different hardware resources to process multiple historical input data to obtain multiple historical output data prior to processing the input data of the target sub-model.

In operation S120, in response to determining that the simulated output data is not in the first numerical range corresponding to the first hardware resource, a second hardware resource is determined from the simulated output data.

In the disclosed embodiment, the hardware resource corresponds to a range of values. For example, a value range corresponding to a 16-bit floating point number may be used as the hardware resource corresponding value range. The value range corresponding to the 32-bit floating point number (FP 32) can also be used as the hardware resource corresponding value range. The hardware resources required by the hardware device to process 32-bit floating point numbers may be greater than the hardware resources required by the hardware device to process 16-bit floating point numbers.

In the embodiment of the disclosure, the second numerical range corresponding to the second hardware resource is larger than the first numerical range. For example, the first hardware resource may correspond to a range of values for 16-bit floating point numbers. The second hardware resource may correspond to a range of values for 32-bit floating point numbers.

In the embodiment of the disclosure, the second hardware resource may be determined according to a numerical range in which the simulation output data is located.

In operation S130, a target task is performed using the second hardware resource to process the input data to obtain target output data of the target sub-model.

In the embodiment of the disclosure, the input data may be input into the target submodel to obtain target output data. For example, the number of processor cores corresponding to the second hardware resource may be greater than the number of processor cores corresponding to the first hardware resource.

By the embodiment of the disclosure, hardware resources required by the target submodel can be dynamically configured. Therefore, the hardware resources of the hardware equipment can be fully utilized, and the utilization rate of the hardware resources is improved under the condition that the stable operation of the model is ensured.

In addition, through the embodiment of the disclosure, the calculation requirements of different precision can be met, the application scene of the model is expanded, and the mixed precision calculation is facilitated.

It will be appreciated that while the method of the present disclosure is described above, the deep learning model and hardware devices of the present disclosure will be further described below.

In some embodiments, the hardware device may be a cluster of hardware devices.

In some embodiments, the deep learning model may include a plurality of sub-models. Further description will be provided below in connection with fig. 2.

FIG. 2 is a schematic diagram of a deep learning model according to one embodiment of the present disclosure.

As shown in fig. 2, the deep-learning model 200 may include a first sub-model 210, a second sub-model 220, and a target sub-model 230.

In the embodiments of the present disclosure, a hardware device may be divided into a plurality of hardware channels. The plurality of hardware channels corresponds to a plurality of hardware resources. For example, the object operators of the object sub-model may suffer from accuracy overflow problems. As shown in fig. 2, the target submodel 230 may include a cumulative operator 231 and a cumulative operator 232. For another example, the hardware channels may include dynamic hardware channels. The target tasks corresponding to the target sub-model 230 may be run by the hardware resources corresponding to the dynamic hardware channel. The dynamic hardware channel may correspond to a plurality of hardware resources. Multiple hardware resources may correspond to multiple accuracies. The plurality of hardware resources may include a first hardware resource. The second hardware resource may also be determined from a plurality of hardware resources.

For example, the first sub-model 210 may be run by a fourth hardware resource and the second sub-model 220 may be run by a fifth hardware resource. Operators in the first sub-model 210 or the second sub-model 220 are difficult to experience precision overflow problems. The first sub-model 210 may include an addition operator 211 and an addition operator 212. The second sub-model 220 may include an multiplier 221 and an multiplier 222. In the deep learning model operation process, the first sub model and the second sub model can be continuously operated by utilizing the fourth hardware resource and the fifth hardware resource respectively.

It will be appreciated that while the hardware devices and deep learning models of the present disclosure are described above, the objective tasks of the present disclosure will be further described below.

In some embodiments, the target task may include one of a forward computing task and a reverse computing task. Accordingly, the input data may include one of forward input data and reverse input data. For example, in the case where the input data is forward input data, the output data may be a forward calculation result. In another example, in the case where the input data is reverse input data, the output data may be gradient data.

It will be appreciated that while the foregoing is illustrative of the subject matter of the present disclosure, the simulation model of the present disclosure will be described.

In some embodiments, determining simulated output data for the target submodel includes: and processing the input data by using a simulation model corresponding to the target sub-model to obtain simulation output data.

In an embodiment of the present disclosure, the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target submodel. For example, a plurality of historical input data and a corresponding plurality of historical output data for a dynamic hardware channel may be obtained. And performing linear regression according to the plurality of historical input data and the plurality of historical output data to obtain a simulation model. It will be appreciated that the simulation model may also be derived from other models. For example, a fully connected network (Fully Connected Network, FCN) is trained using historical input data as training samples, historical output data as labels, as simulation models. It will also be appreciated that in the case of consistent data types of input data, the hardware resources required for a fully connected network or linear regression model may be less than those required for multiple target operators.

In the embodiment of the disclosure, a simulation task corresponding to a simulation model may be executed by using a preset hardware resource. For example, the preset hardware resources may be determined based on historical output data.

It will be appreciated that while the simulation model of the present disclosure is described above, the simulation output data of the present disclosure will be further described below.

In some embodiments, the at least one target operator may be N target operators. N may be an integer greater than 1. The simulation model may include N simulation sub-models, which may correspond to one target operator. As shown in fig. 2, the N target operators may include the above-described accumulation operator 231 and the above-described accumulation operator 232. The N simulation sub-models may include a simulation sub-model corresponding to the accumulation operator 231, and may also include a simulation sub-model corresponding to the accumulation operator 232.

In some embodiments, determining simulated output data for the target submodel may comprise: and processing the input data by using the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, wherein the N simulation output sub-data is used as simulation output data.

In the embodiment of the present disclosure, processing input data using N simulation sub-models to obtain N simulation output sub-data corresponding to N target operators includes: and processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data. For example, the input data may be processed using a simulation sub-model corresponding to the accumulation operator 231, resulting in the 1 st simulation output sub-data.

In the embodiment of the present disclosure, processing input data using N simulation sub-models to obtain N simulation output sub-data corresponding to N target operators may further include: and processing the n-1 th simulation output sub-data by using the n-th simulation sub-model to obtain the n-th simulation output sub-data. N may be an integer greater than 1 and less than or equal to N. Taking n=2 as an example, the simulation sub-model corresponding to the accumulation operator 232 may process the 1 st simulation output sub-data to obtain the 2 nd simulation output sub-data. The 1 st simulation output sub data and the 2 nd simulation output sub data may be used as simulation output data.

In an embodiment of the present disclosure, it may be determined whether the N simulated output sub-data is within a first numerical range corresponding to the first hardware resource. For example, taking the numerical range of the 16-bit floating point number corresponding to the first hardware resource as an example, if the 1 st simulation output sub-data exceeds the numerical range of the 16-bit floating point number, the second hardware resource can be determined from the plurality of hardware resources corresponding to the dynamic hardware channel according to the 1 st simulation output sub-data. For another example, if it is determined that any of the simulated output sub-data is within the first numerical range, the target task may be performed using the first hardware resource. Therefore, the resource expense can be reduced, and the utilization rate of hardware resources can be improved.

It will be appreciated that after determining the plurality of simulated output sub-data, it is determined whether the simulated output sub-data is within the first range of values. However, the present disclosure is not limited thereto, and it may be determined whether or not the simulation output sub-data is within the first numerical range after one simulation output sub-data is determined.

In an embodiment of the present disclosure, in response to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining, from the simulated output data, the second hardware resource includes: in response to determining that the 1 st simulation output sub-data is not within the first range of values, a second hardware resource is determined from the 1 st simulation output sub-data. For example, after the 1 st simulation output sub-data is obtained, it may be determined whether the 1 st simulation output sub-data is within a first range of values. If the 1 st simulation output sub-data is not within the first numerical range, the second hardware resource may be determined according to the 1 st simulation output sub-data. And the processing by using the subsequent simulation sub-model can be stopped, so that the computing resources are saved, and the resource cost is reduced.

In an embodiment of the present disclosure, in response to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining, from the simulated output data, the second hardware resource includes: in response to determining that the nth simulated output sub-data is not within the first range of values, a second hardware resource is determined from the nth simulated output sub-data. For example, in the case where the first n-1 simulation output sub-data is within the first numerical range, after the nth simulation output sub-data is obtained, it may be determined whether the nth simulation output sub-data is within the first numerical range. If the nth simulation output sub-data is not within the first numerical range, the second hardware resource may be determined according to the nth simulation output sub-data. And under the condition that N is smaller than N, the processing by using the following simulation submodel can be stopped, so that the computing resources are saved, and the resource cost is reduced.

It will be appreciated that some of the ways in which the second hardware resource is determined are described above. In the disclosed embodiments, in order to obtain simulated output data quickly and at low cost, input data is processed using a simulation model. However, the simulation model and the target sub-model may be calculated in different ways, and there may be a large difference between the simulated output data and the actual output data, as will be further described below.

In some embodiments, the above method may further comprise: and in response to determining that the simulated output data is in the first numerical range, performing a target task with the first hardware resource to process the input data to obtain first output data. For example, if the simulated output data is within the first numerical range, the target task may be performed using the first hardware resource to obtain the first output data. It will be appreciated that in this case, the first output data may also be the target output data. Next, it may be determined whether a difference between the first output data and the simulated output data is greater than or equal to a preset difference threshold.

In some embodiments, the above method may further comprise: and in response to determining that the difference between the first output data and the simulated output data is greater than or equal to a preset difference threshold, adjusting the first hardware resource to a third hardware resource. The third value range corresponding to the third hardware resource is greater than the first value range. For example, if the difference is greater than or equal to the preset difference threshold, it may be determined that the error of the simulation model is large. For stable execution of the subsequent task of the target task, it is possible to enhance the use of hardware resources corresponding to the data type of higher accuracy to execute the subsequent task. By the embodiment of the disclosure, hardware resources can be timely adjusted, the risk of precision overflow is reduced, dynamic precision management can be realized, and various computing requirements are met.

For another example, if the difference is less than the preset difference threshold, it may be determined that the error of the simulation model is small, and the subsequent task may be performed using the first hardware resource.

Fig. 3 is a block diagram of a task execution device according to one embodiment of the present disclosure.

As shown in fig. 3, the apparatus 300 may include a first determination module 310, a second determination module 320, and a first execution module 330.

The first determining module 310 is configured to determine simulated output data of the target sub-model according to input data of the target sub-model in the deep learning model. The target sub-model includes at least one target operator. And executing the target task corresponding to the target sub-model by the first hardware resource.

The second determining module 320 is configured to determine, according to the simulation output data, a second hardware resource in response to determining that the simulation output data is not in the first numerical range corresponding to the first hardware resource. The second range of values corresponding to the second hardware resource is greater than the first range of values.

The first execution module 330 is configured to execute the target task by using the second hardware resource, so as to process the input data and obtain target output data of the target sub-model.

In some embodiments, the first determination module comprises: and the processing sub-module is used for processing the input data by using the simulation model corresponding to the target sub-model to obtain simulation output data.

In some embodiments, the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.

In some embodiments, the at least one target operator is N, N being an integer greater than 1, the simulation model comprising N simulation sub-models, the simulation sub-model corresponding to one target operator.

In some embodiments, the processing submodule includes: the processing unit is used for processing the input data by utilizing the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, and the N simulation output sub-data are used as simulation output data.

In some embodiments, the processing unit comprises: and the first processing subunit is used for processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data. And the second processing subunit is used for processing the (n-1) th simulation output sub-data by using the (n) th simulation sub-model to obtain the (n) th simulation output sub-data. N is an integer greater than 1 and less than or equal to N.

In some embodiments, the second determination module is further to: in response to determining that the nth simulated output sub-data is not within the first range of values, a second hardware resource is determined from the nth simulated output sub-data.

In some embodiments, the apparatus 300 further comprises: and the second execution module is used for executing a target task by utilizing the first hardware resource to process the input data to obtain first output data in response to determining that the simulation output data is in the first numerical range.

In some embodiments, the apparatus 300 further comprises: and the adjusting module is used for adjusting the first hardware resource to be a third hardware resource in response to determining that the difference between the first output data and the simulation output data is greater than or equal to a preset difference threshold value. The third value range corresponding to the third hardware resource is greater than the first value range.

In some embodiments, the input data includes one of forward input data and reverse input data.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, for example, a task execution method. For example, in some embodiments, the task execution method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the task execution method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the task execution method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) display or an LCD (liquid crystal display)) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of task execution, comprising:

determining simulation output data of a target sub-model according to input data of the target sub-model in a deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource;

responsive to determining that the simulated output data is not in a first numerical range corresponding to the first hardware resource, determining a second hardware resource according to the simulated output data, wherein a second numerical range corresponding to the second hardware resource is greater than the first numerical range; and

and executing the target task by using the second hardware resource to process the input data so as to obtain target output data of the target sub-model.

2. The method of claim 1, wherein the determining simulated output data of the target submodel comprises:

and processing the input data by using a simulation model corresponding to the target sub-model to obtain the simulation output data.

3. The method of claim 2, wherein the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.

4. The method of claim 2, wherein at least one of the target operators is N, N is an integer greater than 1, the simulation model includes N simulation sub-models, the simulation sub-models corresponding to one of the target operators,

the processing the input data by using a simulation model corresponding to the target sub-model, and obtaining the simulation output data comprises the following steps:

and processing input data by using the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, wherein the N simulation output sub-data is used as the simulation output data.

5. The method of claim 4, wherein the processing the input data with the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators comprises:

processing the input data by using the 1 st simulation sub-model to obtain 1 st simulation output sub-data;

and processing the N-1 simulation output sub-data by using the N-th simulation sub-model to obtain the N-th simulation output sub-data, wherein N is an integer greater than 1 and less than or equal to N.

6. The method of claim 5, wherein the determining, in response to determining that the simulated output data is not in the first range of values corresponding to the first hardware resource, a second hardware resource from the simulated output data comprises:

responsive to determining that the nth of the simulated output sub-data is not within the first range of values, the second hardware resource is determined from the nth of the simulated output sub-data.

7. The method of claim 1, further comprising:

and in response to determining that the simulated output data is in the first numerical range, performing the target task by using the first hardware resource to process the input data to obtain first output data.

8. The method of claim 7, further comprising:

and in response to determining that the difference between the first output data and the simulation output data is greater than or equal to a preset difference threshold, adjusting the first hardware resource to be a third hardware resource, wherein a third numerical range corresponding to the third hardware resource is greater than the first numerical range.

9. The method of claim 1, wherein the input data comprises one of forward input data and reverse input data.

10. A task execution device comprising:

the first determining module is used for determining simulation output data of a target sub-model according to input data of the target sub-model in the deep learning model, wherein the target sub-model comprises at least one target operator, and a target task corresponding to the target sub-model is executed by a first hardware resource;

a second determining module, configured to determine, according to the simulation output data, a second hardware resource in response to determining that the simulation output data is not in a first numerical range corresponding to the first hardware resource, where the second numerical range corresponding to the second hardware resource is greater than the first numerical range; and

and the first execution module is used for executing the target task by utilizing the second hardware resource so as to process the input data and obtain target output data of the target sub-model.

11. The apparatus of claim 10, wherein the first determination module comprises:

and the processing sub-module is used for processing the input data by utilizing a simulation model corresponding to the target sub-model to obtain the simulation output data.

12. The apparatus of claim 11, wherein the simulation model is determined from a plurality of historical input data and a plurality of historical output data of the target sub-model.

13. The apparatus of claim 11, wherein at least one of the target operators is N, N is an integer greater than 1, the simulation model includes N simulation sub-models, the simulation sub-models corresponding to one of the target operators,

the processing sub-module comprises:

and the processing unit is used for processing the input data by utilizing the N simulation sub-models to obtain N simulation output sub-data corresponding to the N target operators, and the N simulation output sub-data are used as the simulation output data.

14. The apparatus of claim 13, wherein the processing unit comprises:

the first processing subunit is used for processing the input data by using the 1 st simulation sub-model to obtain the 1 st simulation output sub-data;

and the second processing subunit is used for processing the (N-1) th simulation output sub-data by utilizing the (N) th simulation sub-model to obtain the (N) th simulation output sub-data, wherein N is an integer which is more than 1 and less than or equal to N.

15. The apparatus of claim 14, wherein the second determination module is further configured to:

16. The apparatus of claim 10, further comprising:

and the second execution module is used for executing the target task by utilizing the first hardware resource to process the input data to obtain first output data in response to determining that the simulation output data is in the first numerical range.

17. The apparatus of claim 16, further comprising:

and the adjusting module is used for adjusting the first hardware resource into a third hardware resource in response to the fact that the difference between the first output data and the simulation output data is larger than or equal to a preset difference threshold, wherein a third numerical range corresponding to the third hardware resource is larger than the first numerical range.

18. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 9.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.