CN111338776B

CN111338776B - Scheduling method and related device

Info

Publication number: CN111338776B
Application number: CN202010118354.1A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2023-11-28
Anticipated expiration: 2037-12-28
Also published as: CN109976887B; CN111338776A; CN109976887A

Abstract

The embodiment of the application discloses a scheduling method and a related device, wherein the method is based on a server comprising a plurality of computing devices and comprises the following steps: receiving M operation requests; selecting at least one target computing device from the plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining a computing instruction corresponding to each target computing device in the at least one target computing device; calculating the operation data corresponding to the M operation requests according to the operation instruction corresponding to each target computing device in the at least one target computing device to obtain M final operation results; and sending each final operation result in the M final operation results to corresponding electronic equipment. According to the embodiment of the application, the computing device in the server can be selected to execute the operation request, so that the operation efficiency of the server is improved.

Description

Scheduling method and related device

Technical Field

The application relates to the technical field of computers, in particular to a scheduling method and a related device.

Background

Neural networks are the basis of many artificial intelligence applications at present, and with further expansion of the application range of the neural networks, a server or cloud computing service is adopted to store various neural network models, and operation is performed on operation requests submitted by users. In the face of numerous neural network models and large-batch requests, how to improve the operation efficiency of a server is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a scheduling method and a related device, which can select a computing device in a server to execute an operation request, and improve the operation efficiency of the server.

In a first aspect, an embodiment of the present application provides a scheduling method, based on a server of a plurality of computing devices, the method including:

receiving M operation requests;

selecting at least one target computing device from the plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining a computing instruction corresponding to each target computing device in the at least one target computing device, wherein the attribute information comprises a computing task and a target neural network model;

calculating the operation data corresponding to the M operation requests according to the operation instruction corresponding to each target computing device in the at least one target computing device to obtain M final operation results;

and sending each operation result in the M final operation results to corresponding electronic equipment.

In a second aspect, embodiments of the present application provide a server comprising a plurality of computing devices, wherein:

the receiving unit is used for receiving M operation requests;

The scheduling unit is used for selecting at least one target computing device from the plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining the computing instruction corresponding to each target computing device in the at least one target computing device, wherein the attribute information comprises a computing task and a target neural network model;

the computing unit is used for computing the computing data corresponding to the M computing requests according to the computing instruction corresponding to each target computing device in the at least one target computing device to obtain M final computing results;

and the sending unit is used for sending each final operation result in the M final operation results to the corresponding electronic equipment.

In a third aspect, embodiments of the present application provide another server comprising a processor, a memory, a communication interface and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for part or all of the steps as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.

After the scheduling method and the related device are adopted, the target computing device for executing the M computing requests is selected from the computing devices included in the server based on the received attribute information of the M computing requests, the computing instructions corresponding to the target computing device are determined, the target computing device completes the computing requests according to the computing instructions corresponding to the target computing device, and the final computing result corresponding to each computing request is sent to the corresponding electronic equipment, namely, computing resources are uniformly distributed according to the computing requests, so that the computing devices in the server effectively cooperate, and the computing efficiency of the server is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein:

fig. 1 is a schematic structural diagram of a server according to an embodiment of the present application;

FIG. 1a is a schematic diagram of a computing unit according to an embodiment of the present application;

FIG. 1b is a schematic diagram of a main processing circuit according to an embodiment of the present application;

FIG. 1c is a schematic diagram of data distribution of a computing unit according to an embodiment of the present application;

FIG. 1d is a schematic diagram illustrating data transmission of a computing unit according to an embodiment of the present application;

fig. 1e is an operation schematic diagram of a neural network structure according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a scheduling method according to an embodiment of the present application;

FIG. 3 is a flowchart of another scheduling method according to an embodiment of the present application;

FIG. 4 is a flowchart of another scheduling method according to an embodiment of the present application;

FIG. 5 is a flowchart of another scheduling method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another server according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of another server according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

The embodiment of the application provides a scheduling method and a related device, which can select a computing device in a server to execute an operation request, and improve the operation efficiency of the server. The application will be described in further detail below with reference to specific embodiments and with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a server according to an embodiment of the application. As shown in fig. 1, the server includes a plurality of computing devices, including but not limited to server computers, but also personal computers (personal computer, PCs), network PCs, minicomputers, mainframe computers, and the like.

In the present application, a connection is established and data is transmitted between each computing device included in the server by wire or wireless, and each computing device includes at least one computing carrier, such as: a central processing unit (Central Processing Unit, CPU), an image processor (graphics processing unit, GPU), a processor board, etc. The server related to the application can also be a cloud server for providing cloud computing service for the electronic equipment.

Wherein each computing carrier may comprise at least one computing unit for neural network operations, such as: a processing chip, etc. With reference to fig. 1a, fig. 1a is a schematic structural diagram of a computing unit. As shown in fig. 1a, the calculation unit comprises: main processing circuit, basic processing circuit and branch processing circuit. Specifically, the main processing circuit is connected with the branch processing circuit, and the branch processing circuit is connected with at least one basic processing circuit.

The branch processing circuit is used for receiving and transmitting data of the main processing circuit or the basic processing circuit.

Referring to fig. 1b, fig. 1b is a schematic diagram of a main processing circuit, where the main processing circuit may include a register and/or an on-chip buffer circuit, and the main processing circuit may further include a control circuit, a vector arithmetic circuit, an ALU (arithmetic and logic unit, arithmetic logic circuit) circuit, an accumulator circuit, a DMA (Direct Memory Access ) circuit, etc., although in practical applications, the main processing circuit may further include a conversion circuit (e.g., a matrix transpose circuit), a data rearrangement circuit, an activation circuit, etc.

The main processing circuit also comprises a data transmitting circuit, a data receiving circuit or an interface, wherein the data transmitting circuit can integrate the data distributing circuit and the data broadcasting circuit, and the data distributing circuit and the data broadcasting circuit can be respectively arranged in practical application; in practical applications, the data transmitting circuit and the data receiving circuit may be integrated together to form a data transceiver circuit. For broadcast data, i.e. data that needs to be sent to each basic processing circuit. For distributing data, that is, data that needs to be selectively sent to a part of the basic processing circuit, a specific selection mode can be specifically determined by the main processing circuit according to the load and the calculation mode. For the broadcast transmission scheme, broadcast data is transmitted in broadcast form to each base processing circuit. (in practical applications, broadcast data is transmitted to each basic processing circuit by a one-time broadcast method, or broadcast data may be transmitted to each basic processing circuit by a multiple-time broadcast method, and the number of times of the broadcast is not limited in the embodiment of the present application).

When the data distribution is realized, the control circuit of the main processing circuit transmits the data to part or all of the basic processing circuits (the data can be the same or different, specifically, if the data is transmitted in a distribution mode, the data received by the basic processing circuits of each received data can be different, and the data received by part of the basic processing circuits can be the same;

specifically, when broadcasting data, the control circuit of the main processing circuit transmits the data to part or all of the basic processing circuits, and each basic processing circuit receiving the data can receive the same data, that is, the broadcasting data can include the data that all the basic processing circuits need to receive. Distributing the data may include: part of the underlying processing circuitry requires the received data. The main processing circuit may send the broadcast data to all the branch processing circuits by one or more broadcasts, and the branch processing circuits forward the broadcast data to all the base processing circuits.

Alternatively, the vector operator circuit of the main processing circuit may perform vector operations, including but not limited to: two vectors add, subtract, multiply, divide, add to, subtract from, multiply, divide, or perform any operation on each element in the vector. The continuous operation may be vector and constant addition, subtraction, multiplication, division, activation, accumulation, etc.

Each base processing circuit may include a base register and/or a base on-chip cache circuit; each base processing circuit may further include: an inner product operator circuit, a vector operator circuit, an accumulator circuit, or the like. The inner product arithmetic circuit, the vector arithmetic circuit, and the accumulator circuit may be integrated circuits, or may be individually provided.

The connection structure of the branch processing circuit and the base circuit may be arbitrary and is not limited to the H-type structure of fig. 1 b. Alternatively, the main processing circuit to the base circuit is a broadcast or distributed architecture, and the base circuit to the main processing circuit is a gather (gather) architecture. The definition of broadcast, distribution and collection is as follows:

the data transfer manner from the main processing circuit to the basic circuit may include:

the main processing circuit is respectively connected with a plurality of branch processing circuits, and each branch processing circuit is respectively connected with a plurality of basic circuits.

The main processing circuit is connected with a branch processing circuit, the branch processing circuit is connected with a branch processing circuit again, and the like, a plurality of branch processing circuits are connected in series, and then each branch processing circuit is connected with a plurality of basic circuits respectively.

The main processing circuit is respectively connected with a plurality of branch processing circuits, and each branch processing circuit is connected with a plurality of basic circuits in series.

The main processing circuit is connected to a branch processing circuit which is in turn connected to a branch processing circuit, and so on, a plurality of branch processing circuits are connected in series, and then each branch processing circuit is in series with a plurality of base circuits.

When distributing data, the main processing circuit transmits data to part or all of the basic circuits, and the data received by the basic circuits for receiving the data can be different;

when broadcasting data, the main processing circuit transmits the data to part or all of the basic circuits, and each basic circuit receiving the data receives the same data.

When data is collected, some or all of the base circuitry transmits the data to the main processing circuitry. It should be noted that the computing unit shown in fig. 1a may be a single physical chip, and of course, in practical applications, the computing unit may also be integrated into other chips (e.g. CPU, GPU), and the embodiment of the present application is not limited to the physical representation of the chip device described above.

Referring to fig. 1c, fig. 1c is a schematic diagram of data distribution of a computing unit, and the arrow of fig. 1c is a distribution direction of data, and after the main processing circuit receives external data, the main processing circuit splits the external data and distributes the split data to a plurality of branch processing circuits, and the branch processing circuits send the split data to the basic processing circuit, as shown in fig. 1 c.

Referring to fig. 1d, fig. 1d is a schematic diagram of data feedback of a computing unit, as shown by the arrow in fig. 1d, the arrow is the feedback direction of the data, and as shown in fig. 1d, the basic processing circuit returns the data (e.g. the inner product result) to the branch processing circuit, and the branch processing circuit returns to the main processing circuit.

The input data may be vector, matrix, multi-dimensional (three-dimensional or four-dimensional or more) data, and a specific value of the input data may be referred to as an element of the input data.

The embodiment of the disclosure further provides a computing method of the computing unit shown in fig. 1a, where the computing method is applied to the neural network computing, and in particular, the computing unit may be used to perform an operation on input data and weight data of one or more layers in the multi-layer neural network.

Specifically, the computing unit is configured to perform an operation on input data and weight data of one or more layers in the trained multi-layer neural network;

or the calculation unit is used for executing operation on the input data and the weight data of one or more layers in the multi-layer neural network of the forward operation.

Such operations include, but are not limited to: one or any combination of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, bias operation, full connection operation, GEMM operation, GEMV operation and activation operation.

GEMM calculation refers to: operations of matrix-matrix multiplication in the BLAS library. The general representation of this operation is: c=alpha×op (S) ×op (P) +beta×c, where S and P are two matrices of inputs, C is an output matrix, alpha and beta are scalar, op represents some operation on the matrix S or P, and there are some auxiliary integers as parameters to describe the width and height of the matrix S and P;

GEMV calculation refers to: operations of matrix-vector multiplication in the BLAS library. The general representation of this operation is: c=alpha×op (S) ×p+beta×c, where S is an input matrix, P is an input vector, C is an output vector, alpha and beta are scalar quantities, and op represents some operation on the matrix S.

The application does not limit the connection relation between the calculation carriers in the calculation device, can be isomorphic or heterogeneous calculation carriers, and does not limit the connection relation between the calculation units in the calculation carriers, and can improve the calculation efficiency by executing parallel tasks through the heterogeneous calculation carriers or calculation units.

The computing device shown in fig. 1 includes at least one computing carrier, where the computing carrier further includes at least one computing unit, that is, the selected target computing device in the present application depends on the connection relationship between the computing devices and the specific physical hardware supporting conditions such as a neural network model and network resources deployed in each computing device and the attribute information of the operation request, so that the computing carrier of the same type may be deployed in the same computing device, for example, the computing carrier for forward propagation may be deployed in the same computing device, instead of different computing devices, so as to effectively reduce the communication overhead between the computing devices and facilitate the improvement of the operation efficiency; the specific neural network model can be deployed on a specific computing carrier, namely, when the server receives the operation request aiming at the specific neural network, the computing carrier corresponding to the specific neural network is called to execute the operation request, so that the time for determining the processing task is saved, and the operation efficiency is improved.

In the present application, a neural network model that will be disclosed and widely used is used as a specified neural network model (e.g., leNet, alexNet, ZFnet, googleNet, VGG, resNet in a convolutional neural network (convolutional neural network, CNN)).

Optionally, acquiring an operation requirement of each specified neural network model in the specified neural network model set and a hardware attribute of each computing device in the plurality of computing devices to obtain a plurality of operation requirements and a plurality of hardware attributes; and deploying corresponding specified neural network models on a specified computing device corresponding to each specified neural network model in the specified neural network model set according to the operation requirements and the hardware attributes.

The specified neural network model set comprises a plurality of specified neural network models, and the hardware attribute of the computing device comprises network bandwidth, storage capacity, main frequency of a processor and the like of the computing device, and also comprises the hardware attribute of a computing carrier or a computing unit in the computing device. That is, according to the hardware attribute of each computing device, the computing device corresponding to the operation requirement of the specified neural network model is selected, so that the server fault caused by untimely processing can be avoided, and the operation support capability of the server can be improved.

The input neuron and the output neuron mentioned in the application do not refer to the neuron in the input layer and the neuron in the output layer of the whole neural network, but for any two adjacent layers in the network, the neuron in the lower layer of the feedforward operation of the network is the input neuron, and the neuron in the upper layer of the feedforward operation of the network is the output neuron. Taking a convolutional neural network as an example, a convolutional neural network is provided with L layers, K=1, 2, and the number of the L-1 layers, wherein for the K layer and the K+1th layer, the K layer is called an input layer, the neuron is the input neuron, the K+1th layer is called an output layer, and the neuron is the output neuron. That is, each layer except the topmost layer can be used as an input layer, and the next layer is a corresponding output layer.

The operations mentioned above are all operations of one layer in the neural network, and for the multi-layer neural network, the implementation process is shown in fig. 1e, where the arrow with a broken line indicates a reverse operation, and the arrow with a solid line indicates a forward operation. In the forward operation, after the execution of the artificial neural network of the upper layer is completed, the output neuron obtained by the upper layer is used as the input neuron of the lower layer to perform operation (or perform certain operations on the output neuron and then be used as the input neuron of the lower layer), and meanwhile, the weight is replaced by the weight of the lower layer. In the inverse operation, after the execution of the inverse operation of the artificial neural network of the previous layer is completed, the input neuron gradient obtained by the previous layer is used as the output neuron gradient of the next layer to perform operation (or perform certain operations on the input neuron gradient and then be used as the output neuron gradient of the next layer), and the weight is replaced by the weight of the next layer.

The forward operation of the neural network is an operation process of inputting input data to final output data, and the backward operation is opposite to the propagation direction of the forward operation, and is an operation process of passing the forward operation in a reverse direction for a loss function corresponding to the loss or loss of the final output data and the expected output data. The weight of each layer is corrected by repeated forward and reverse information operation according to the loss or the gradient decline of the loss function, and the weight of each layer is adjusted, so that the method is also a process of learning and training by the neural network, and the loss of network output can be reduced.

Referring to fig. 2, fig. 2 is a flow chart of a scheduling method according to an embodiment of the present application, as shown in fig. 2, the method is applied to the server shown in fig. 1, and the method relates to the above electronic device that allows access to the server, where the electronic device may include various handheld devices, vehicle devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various types of User Equipment (UE), mobile Station (MS), terminal device (terminal device), and so on.

201: m operation requests are received.

In the application, M is a positive integer, and the server receives M operation requests sent by the electronic devices allowed to access, and the number of the electronic devices and the number of the operation requests sent by each electronic device are not limited, i.e. the M operation requests may be sent by one electronic device or may be sent by a plurality of electronic devices.

The operation request includes attribute information such as an operation task (training task or test task), a target neural network model related to the operation, and the like. The training task is used for training the target neural network model, namely, forward operation and reverse operation are carried out on the neural network model until the training is completed; and the test task is used for carrying out forward operation once according to the target neural network model.

The target neural network model may be a neural network model uploaded when the user sends an operation request through the electronic device, or may be a neural network model stored in a server, etc., and the number of the target neural network models is not limited in the present application, that is, each operation request may correspond to at least one target neural network model.

202: and selecting at least one target computing device from a plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining a computing instruction corresponding to each target computing device in the at least one target computing device.

The application is not limited to how to select the target computing device, and may be selected according to the number of operation requests and the number of target neural network models, for example: if an operation request exists and corresponds to a target neural network model, the operation instructions corresponding to the operation request can be classified to obtain parallel instructions and serial instructions, the parallel instructions are distributed to different target computing devices for operation, the serial instructions are distributed to the target computing devices which are good in processing for operation, and the operation efficiency of each instruction is improved, so that the operation efficiency is improved; if a plurality of operation requests exist and the plurality of operation requests correspond to one target neural network model, a target computing device comprising the target neural network model can be adopted to carry out batch processing on operation data corresponding to the plurality of operation requests, so that time waste caused by repeated operation is avoided, and additional expenditure caused by communication among different computing devices is avoided, thereby improving the operation efficiency; if a plurality of operation requests exist and the plurality of operation requests correspond to a plurality of target neural network models, the computing device which is good at processing the target neural network models or the computing device which is deployed with the target neural network models before can be searched for the operation requests, namely, the time for initializing the network is saved, and the operation efficiency is improved.

Optionally, if the operation task of the target operation request is a test task, selecting a computing device including forward operation of a target neural network model corresponding to the target operation task from the plurality of computing devices to obtain a first target computing device; if the operation task of the target operation request is a training task, selecting a computing device comprising forward operation and backward training of a target neural network model corresponding to the target operation task from the plurality of computing devices to obtain the first target computing device; and determining an operation instruction corresponding to the first target computing device as the target operation request.

The target operation request is any operation request in the M operation requests, and the first target computing device is a target computing device corresponding to the target operation request in the at least one target computing device.

That is, if the operation task of the target operation request is a test task, the first target computing device is a computing device that can be used to perform the forward operation of the target neural network model; when the operation task is a training task, the first target computing device is a computing device which can be used for performing forward operation and backward training of the target neural network model, namely, the accuracy and the operation efficiency of operation can be improved by processing the operation request through the special computing device.

For example, the server includes a first computing device and a second computing device, where the first computing device only includes forward operations for specifying the neural network model, and the second computing device may perform both forward operations and backward training operations for specifying the neural network model. And when the target neural network model in the received target operation request is the specified neural network model and the operation task is a test task, determining that the first computing device executes the target operation request.

203: and calculating the operation data corresponding to the M operation requests according to the operation instruction corresponding to each target computing device in the at least one target computing device to obtain M final operation results.

The application does not limit the operation data corresponding to each operation request, and the operation data can be image data for image recognition, sound data for voice recognition and the like; when the operation task is a test task, the operation data is data uploaded by the user, and when the operation task is a training task, the operation data can be a training set uploaded by the user or a training set stored in a server and corresponding to the target neural network model.

Multiple intermediate operation results can be generated in the calculation process of the operation instruction, and the final operation result corresponding to each operation request can be obtained according to the multiple intermediate operation results.

204: and sending each final operation result in the M final operation results to corresponding electronic equipment.

It can be understood that, based on the received attribute information of the operation requests, a target computing device executing the M operation requests is selected from the M multiple computing devices included in the server, and an operation instruction corresponding to the target computing device is determined, the target computing device completes the operation requests according to the operation instruction corresponding to the target computing device, and sends a final operation result corresponding to each operation request to the corresponding electronic device, that is, the computing resources are uniformly allocated according to the operation requests, so that the multiple computing devices in the server effectively cooperate, thereby improving the operation efficiency of the server.

Optionally, the method further comprises: waiting for a first preset duration, detecting whether each target computing device in the at least one target computing device returns a final operation result of a corresponding operation instruction, and if not, taking the target computing device which does not return the final operation result as a delay computing device; selecting a standby computing device from the idle computing devices of the computing devices according to the operation instruction corresponding to the delay computing device; executing the operation instruction corresponding to the delay computing device through the standby computing device.

That is, when the first preset time period arrives, the computing device which does not complete the operation instruction is used as a delay computing device, and a standby computing device is selected from the idle computing devices according to the operation instruction executed by the delay computing device, so that the operation efficiency is improved.

Optionally, after the executing, by the standby computing device, the operation instruction corresponding to the delay computing device, the method further includes: acquiring a final operation result returned firstly between the delay computing device and the standby computing device; and sending a pause instruction to a computing device which does not return a final operation result between the delay computing device and the standby computing device.

The suspension instruction is used for indicating that the computing device which does not return the final operation result between the delay computing device and the standby computing device suspends executing the corresponding operation instruction. That is, the standby computing device executes the operation instruction corresponding to the delay computing device, selects the final operation result returned first between the standby computing device and the delay computing device as the final operation result corresponding to the operation instruction, and sends the pause instruction to the computing device which does not return the final operation result between the delay computing device and the standby computing device, that is, pauses the operation of the computing device which does not complete the operation instruction, thereby saving power consumption.

Optionally, the method further comprises: and waiting for a second preset duration, detecting whether the delay computing device returns a final operation result corresponding to the operation instruction, and if not, taking the delay computing device which does not return the final operation result as a fault computing device and sending the fault instruction.

The fault instruction is used for informing operation and maintenance personnel that the fault calculation device breaks down, and the second preset time period is longer than the first preset time period. That is, when the second preset duration arrives, if the final operation result returned by the delay computing device is not received, the execution delay computing device is judged to be faulty, and the corresponding operation and maintenance personnel are informed, so that the fault processing capability is improved.

Optionally, the method further comprises: the hash tables of the plurality of computing devices are updated every target time threshold.

A Hash table (also called a Hash table) is a data structure that is directly accessed according to a Key value (Key value). In the present application, IP addresses of a plurality of computing devices are used as key values, and mapped to a position in a hash table through a hash function (mapping function), that is, after a target computing device is determined, physical resources allocated by the target computing device can be quickly found. The specific form of the hash table is not limited, and a static hash table may be manually set, or a hardware resource may be allocated according to an IP address. And the hash tables of the plurality of computing devices are updated every other target time threshold, so that the searching accuracy and searching efficiency are improved.

Referring to fig. 3, fig. 3 is a flowchart of another scheduling method according to an embodiment of the present application, as shown in fig. 3, the method is applied to the server shown in fig. 1, and the method involves an electronic device that allows access to the server.

301: an operation request is received.

I.e. the number of operation requests received in the server in step 201 above is 1, i.e. m=1.

302: and obtaining an instruction stream of the target neural network model corresponding to the operation request.

The instruction flow indicates the operation sequence of the target neural network model and the instructions corresponding to each sequence, namely the instruction sequence, and the operation of the target neural network model can be realized through the instruction flow. Each target neural network model corresponds to a basic operation sequence, namely, a data structure describing the operation of the target neural network model is obtained by analyzing the target neural network model. The application does not limit the analysis rule between the basic operation sequence and the instruction descriptor, and obtains the instruction descriptor stream corresponding to the target neural network model according to the analysis rule between the basic operation sequence and the instruction descriptor.

The present application is not limited to the predetermined format of each instruction descriptor stream in the instruction descriptor streams. And generating an instruction corresponding to the instruction descriptor stream according to the network structure of the preset format. The above instructions include all instructions in the cambricon instruction set, such as a matrix operation instruction, a convolution operation instruction, a full-join-layer forward operation instruction, a pooling operation instruction, a normalization instruction, a vector operation instruction, and a scalar operation instruction.

Optionally, the obtaining the instruction flow of the target neural network model corresponding to the operation request includes: acquiring a first instruction descriptor stream according to a basic operation sequence corresponding to the target neural network model; simplifying the first instruction descriptor stream to obtain a second instruction descriptor stream; and acquiring the instruction stream according to the second instruction descriptor stream.

That is, the unnecessary instruction descriptors in the first instruction descriptor stream are eliminated by simplifying the first instruction descriptor stream, thereby shortening the instruction stream. And then the instruction stream which can be executed by the computing device is obtained according to the second instruction descriptor stream, and output data is obtained according to the instruction and the input data operation, so that redundant input, output or other operations generated when the operation is performed by using a complete neural network formed by fine-granularity atomic operations such as convolution, pooling, activation and the like are overcome, and the operation speed of the server is further improved.

If the operation request corresponds to a plurality of target neural network models, the instruction stream of the plurality of target neural network models needs to be acquired and then split, thereby completing the operation request.

303: the instruction stream is split into a plurality of parallel instructions and a plurality of serial instructions.

The present application is not limited to how to split the instruction stream, a parallel instruction is an instruction that can be allocated to multiple computing devices to perform operations simultaneously, and a serial instruction is an instruction that can only be completed by a single computing device. For example: the operation requests of video recognition, understanding and the like generally comprise a feature extraction instruction and a feature recognition instruction, wherein the feature extraction instruction needs to carry out convolution processing on continuous frames of images, and the feature recognition instruction generally needs to carry out calculation of a cyclic neural network on features obtained by the feature extraction instruction. The above-described feature extraction instructions may be distributed to multiple computing devices, while feature identification instructions may only be processed by a single computing device.

304: a plurality of parallel computing devices corresponding to the plurality of parallel instructions and at least one serial computing device corresponding to the plurality of serial instructions are selected from a plurality of computing devices.

The instruction stream is split to obtain a plurality of parallel instructions and a plurality of serial instructions, a parallel computing device corresponding to each parallel instruction and a serial computing device corresponding to the plurality of serial instructions are selected from a plurality of computing devices included in a server, so as to obtain a plurality of parallel computing devices and at least one serial computing device, namely, in the step 202, at least one target computing device is a plurality of parallel computing devices and at least one serial computing device, and an operation instruction of each parallel computing device is a parallel instruction corresponding to the parallel computing device, and an operation instruction of each serial computing device is a corresponding serial instruction.

The present application is not limited to the selection method of the serial computing device and the parallel computing device.

Optionally, if the operation task is a training task, grouping the operation data corresponding to the operation request based on a training method corresponding to the training task to obtain multiple groups of operation data; and selecting the plurality of parallel computing devices from the plurality of computing devices according to the plurality of groups of operation data and the plurality of parallel instructions.

The operation data corresponding to the operation request is grouped according to a specific training method to obtain a plurality of groups of operation data, the operation data may be grouped according to the data type of the operation data, or the operation data may be divided into a plurality of groups, which is not limited herein. After grouping, proper computing devices are selected for parallel operation, namely, the operation amount of each computing device is further reduced, and therefore the operation efficiency is improved.

For example: for a batch gradient descent algorithm (Batch Gradient Descent, BGD), it is considered that there is one training set (batch), which can be divided into subsets and assigned to multiple computing devices, where each computing device trains one subset, each subset being a set of operational data; for a random gradient descent algorithm (Stochastic Gradient Descent, SGD), considering only one operational data in each training set (batch), different training sets may be assigned to different computing devices; for small batch gradient descent algorithms (mini-batch SGDs), different data for each batch may be assigned to different computing devices for computation, or each batch may be divided into smaller subsets and reassigned to different computing devices for computation.

In the present application, the serial computing device may be one computing device or a plurality of computing devices among a plurality of parallel computing devices, or may be another idle computing device or the like.

For the case that the serial instruction corresponds to a portion, and the difference of the calculation characteristics between each portion in the serial instruction is large, optionally, the selecting the at least one serial computing device corresponding to the plurality of serial instructions from the plurality of computing devices includes: grouping the plurality of serial instructions to obtain at least one group of serial instruction sequences; selecting a computing device from the plurality of computing devices that corresponds to each of the at least one set of serial instruction sequences to obtain the at least one serial computing device.

The serial instructions are grouped to obtain a plurality of groups of instruction sequences, the computing device corresponding to each group of instruction sequences is selected for operation, and the computing device good in processing executes the corresponding instructions, so that the operation efficiency of each part is improved, and the overall operation efficiency is improved.

Taking fast regional convolutional neural network (Faster region-based convolution neural network, fast R-CNN) as an example, the fast R-CNN is composed of a convolutional layer, a candidate regional neural network (region proposal network, RPN) layer and a region of interest pooling (region of interest pooling, ROI pooling) layer, and the calculation characteristics between each layer are greatly different, the convolutional layer and the RPN can be deployed on a neural network computing device good for processing convolution, and the ROI pooling can be deployed on a neural network computing device good for processing convolution, such as a more general processor CPU, so that the calculation efficiency of each part is improved, and the overall calculation efficiency is improved.

305: and calculating the operation data corresponding to the operation request according to the parallel instruction corresponding to each parallel computing device in the plurality of parallel computing devices and the serial instruction corresponding to each serial computing device in the at least one serial computing device to obtain a final operation result.

306: and sending the final operation result to the electronic equipment sending the operation request.

That is, when a single operation request sent by the electronic device is received in the server, the instruction stream of the target neural network model corresponding to the operation request is split, the parallel computing device corresponding to the parallel instruction and the serial computing device corresponding to the serial instruction are selected, the serial computing device good at processing the serial instruction is selected to independently execute the corresponding serial instruction, and the final operation result corresponding to the operation request is sent to the electronic device. That is, the parallel instruction is executed in parallel by the computing devices in the plurality of parallel computing device sets, so that the execution time of the parallel instruction is saved, and the computing efficiency between each serial instruction is improved by the serial computing device. That is, computing resources are uniformly allocated according to the computing requests, so that a plurality of computing devices in the server effectively cooperate, and the overall computing efficiency of the server is improved.

Referring to fig. 4, fig. 4 is a flowchart of another scheduling method according to an embodiment of the present application, as shown in fig. 4, the method is applied to the server shown in fig. 1, and the method involves an electronic device that allows access to the server.

401: a plurality of operation requests are received.

I.e. the number of operation requests received in the server in step 201 above is greater than 1, i.e. M >1.

402: and if the plurality of operation requests correspond to one target neural network model, selecting a target parallel computing device corresponding to the target neural network model from a plurality of computing devices, and determining that an operation instruction of the target parallel computing device is parallel operation of the plurality of operation requests.

That is, if there are multiple operation requests and the multiple operation requests are all directed against the same target neural network model, the target parallel computing device corresponding to the target neural network model can be selected from the multiple computing devices, so that the target parallel computing device is convenient to perform parallel operation on the operation data corresponding to the multiple operation requests, and time waste caused by repeatedly using the target neural network model for operation is avoided.

403: and calculating the operation data corresponding to the operation requests according to the operation instruction of the target parallel computing device to obtain a plurality of final operation results.

404: and sending each final operation result in the plurality of final operation results to corresponding electronic equipment.

It can be understood that if there are multiple operation requests in the server and the multiple operation requests are all directed against the same target neural network model, one target parallel computing device corresponding to the target neural network model may be selected from the multiple computing devices, the target parallel computing device is used to perform batch operation on the operation data corresponding to the multiple operation requests, and the final operation result obtained by the operation is differentiated to obtain the final operation result corresponding to each operation request and sent to the corresponding electronic device, so that time waste caused by repeatedly using the target neural network model to perform operation is avoided, and thus the overall operation efficiency of the server is improved.

Referring to fig. 5, fig. 5 is a flowchart of another scheduling method according to an embodiment of the present application, as shown in fig. 5, the method is applied to the server shown in fig. 1, and the method involves an electronic device that allows access to the server.

501: a plurality of operation requests are received.

502: and if the plurality of operation requests correspond to the plurality of target neural network models, selecting a plurality of target serial computing devices corresponding to each target neural network model in the plurality of target neural network models from a plurality of computing devices, and determining an operation instruction corresponding to each target serial computing device in the plurality of target serial computing devices as the operation request corresponding to the target neural network model corresponding to the target serial computing device.

That is, if there are a plurality of operation requests and the plurality of operation requests are directed to a plurality of target neural network models, a target serial computing device corresponding to the target neural network model can be selected from the plurality of computing devices, respectively, so that the operation efficiency of each operation request is improved. And the target serial computing device is deployed with a corresponding target neural network model, so that the time for network initialization is saved, and the operation efficiency is improved.

503: and calculating the operation data corresponding to the operation requests according to the operation instruction corresponding to each target serial computing device in the target serial computing devices to obtain a plurality of final operation results.

504: and sending each final operation result in the plurality of final operation results to corresponding electronic equipment.

It can be understood that if there are multiple operation requests in the server and the multiple operation requests are directed to multiple target neural network models, the target serial computing devices corresponding to the target neural network models can be selected from the multiple computing devices, and the corresponding operation requests are executed by the target serial computing devices, so that the operation efficiency of each operation request can be improved. And the target serial computing device is deployed with a corresponding target neural network model, so that the time for network initialization is saved, and the operation efficiency is improved.

Optionally, selecting an auxiliary scheduling algorithm from the auxiliary scheduling algorithm set according to the attribute information of each operation request in the M operation requests; and selecting the at least one target computing device from the plurality of computing devices according to the auxiliary scheduling algorithm, and determining an operation instruction corresponding to each of the at least one target computing device.

Wherein the secondary scheduling algorithm set includes, but is not limited to, one of: a Round-Robin Scheduling algorithm, a weighted Round-Robin (Weighted Round Robin) algorithm, a least-chained (Least Connections) algorithm, a weighted least-chained (Weighted Least Connections) algorithm, a Locality-based least-chained (Locality-Based Least Connections) algorithm, a Locality-based least-chained (Locality-Based Least Connections with Replication) algorithm with replication, a destination address Hashing (Destination Hashing) algorithm, a Source address Hashing (Source Hashing) algorithm.

The application is not limited to how to select the auxiliary scheduling algorithm according to the attribute information, for example, if the multiple target computing devices process the same operation request, the auxiliary scheduling algorithm may be a polling scheduling algorithm; if the compressive capacities of different target computing devices are different, more operation requests should be allocated to the target computing devices with high configuration and low load, the auxiliary scheduling algorithm may be a weighted polling algorithm; if the workload allocated to each of the plurality of target computing devices is different, the auxiliary scheduling algorithm may be a minimum link scheduling algorithm, and dynamically select one of the target computing devices with the minimum number of current backlog connections to process the current request, so as to increase the utilization efficiency of the target computing device as much as possible, or may be a weighted minimum link scheduling algorithm.

That is, on the basis of the scheduling method as described in the embodiments of fig. 2, 3, 4 or 5, a computing device that performs the operation request finally is selected in combination with the auxiliary scheduling algorithm, so as to further improve the operation efficiency of the server.

In accordance with the embodiments shown in fig. 2, 3, 4 or 5, please refer to fig. 6, fig. 6 is a schematic structural diagram of another server according to the present application, wherein the server includes a plurality of computing devices. As shown in fig. 6, the server 600 includes:

a receiving unit 601, configured to receive M operation requests, where M is a positive integer;

a scheduling unit 602, configured to select at least one target computing device from a plurality of computing devices according to attribute information of each of the M computing requests, and determine an operation instruction corresponding to each of the at least one target computing device;

an operation unit 603, configured to calculate operation data corresponding to the M operation requests according to an operation instruction corresponding to each target computing device in the at least one target computing device, so as to obtain M final operation results;

and the sending unit 604 is configured to send each final operation result in the M final operation results to a corresponding electronic device.

Optionally, the attribute information includes a target neural network model, the at least one target computing device is a plurality of parallel computing devices and at least one serial computing device, and the server 600 further includes:

an obtaining unit 605, configured to obtain an instruction stream of the target neural network model corresponding to the operation request if the M is 1;

a grouping unit 606 for splitting the instruction stream into a plurality of parallel instructions and a plurality of serial instructions; the scheduling unit 602 selects the plurality of parallel computing devices corresponding to the plurality of parallel instructions and the at least one serial computing device corresponding to the plurality of serial instructions from the plurality of computing devices, determines an operation instruction of each of the plurality of parallel computing devices as a corresponding parallel instruction, and determines an operation instruction of each of the at least one serial computing device as a corresponding serial instruction.

Optionally, the acquiring unit 605 is specifically configured to acquire a first instruction descriptor stream according to a basic operation sequence corresponding to the target neural network model; simplifying the first instruction descriptor stream to obtain a second instruction descriptor stream; and acquiring the instruction stream according to the second instruction descriptor stream.

Optionally, the grouping unit 606 is specifically configured to group the plurality of serial instructions to obtain at least one group of serial instruction sequences; the scheduling unit 602 is specifically configured to select, from the plurality of computing devices, a computing device corresponding to each of the at least one set of serial instruction sequences, and obtain the at least one serial computing device.

Optionally, the grouping unit 606 is specifically configured to group the operation data corresponding to the operation request based on a training method corresponding to the operation request if the operation request is a training task, so as to obtain multiple groups of operation data; the scheduling unit 602 is specifically configured to select the plurality of parallel computing devices from the plurality of computing devices according to the plurality of sets of operation data and the plurality of parallel instructions.

Optionally, the attribute information includes a target neural network model, the at least one target computing device is a target parallel computing device, and the scheduling unit 602 is specifically configured to select, from the multiple computing devices, the target parallel computing device corresponding to the target neural network model, and determine that an operation instruction of the target parallel computing device is to perform parallel operation on the M operation requests if M is greater than 1 and the M operation requests correspond to the target neural network model.

Optionally, the attribute information includes a target neural network model, the at least one target computing device is a plurality of target serial computing devices, and the scheduling unit 602 is specifically configured to select, from the plurality of computing devices, the plurality of target serial computing devices corresponding to each of the plurality of target neural network models, and determine, as the execution request, the operation instruction corresponding to each of the plurality of target serial computing devices.

Optionally, the attribute information includes an operation task, and the scheduling unit 602 is specifically configured to select, from the plurality of computing devices, a computing device including a forward operation of a target neural network model corresponding to the target operation task, to obtain a first target computing device, where the target operation request is any one of the M operation requests, and the first target computing device is a target computing device corresponding to the target operation request in the at least one target computing device, where the operation task of the target operation request is a test task; if the operation task of the target operation request is a training task, selecting a computing device comprising forward operation and backward training of a target neural network model corresponding to the target operation task from the plurality of computing devices, and obtaining the first target computing device; and determining an operation instruction corresponding to the first target computing device as the target operation request.

Optionally, the scheduling unit 602 is specifically configured to select a secondary scheduling algorithm from a secondary scheduling algorithm set according to attribute information of each operation request in the M operation requests, where the secondary scheduling algorithm set includes at least one of the following: a polling scheduling algorithm, a weighted polling algorithm, a least-linking algorithm, a weighted least-linking algorithm, a locality-based least-linking algorithm with replication, a target address hashing algorithm, a source address hashing algorithm; and selecting the at least one target computing device from the plurality of computing devices according to the auxiliary scheduling algorithm, and determining an operation instruction corresponding to each of the at least one target computing device.

Optionally, the server further includes a detecting unit 607, configured to wait for a first preset duration, detect whether each target computing device in the at least one target computing device returns a final operation result of the corresponding operation instruction, and if not, use the target computing device that does not return the final operation result as a delay computing device; selecting, by the scheduling unit 602, a standby computing device from among the idle computing devices of the plurality of computing devices according to an operation instruction corresponding to the delay computing device; the arithmetic unit 603 executes the arithmetic instruction corresponding to the delay calculating device through the standby calculating device.

Optionally, the obtaining unit 605 is further configured to obtain a final operation result that is first returned between the delay computing device and the standby computing device; a pause instruction is sent by the sending unit 604 to a computing device between the delay computing device and the standby computing device that does not return a final operation result.

Optionally, the detecting unit 607 is further configured to wait for a second preset duration, detect whether the delay computing device returns a final operation result corresponding to the operation instruction, and if not, take the delay computing device that does not return the final operation result as a fault computing device; and the sending unit 604 sends a fault instruction, where the fault instruction is used to inform an operation and maintenance person that the fault computing device breaks down, and the second preset time period is longer than the first preset time period.

Optionally, the server further includes an updating unit 608, configured to update the hash table of the server every target time threshold.

Optionally, the obtaining unit 605 is further configured to obtain the operation requirement of each neural network model in the specified set of neural network models and the hardware attribute of each computing device in the plurality of computing devices to obtain a plurality of operation requirements and a plurality of hardware attributes;

The server further includes a deployment unit 609 configured to deploy, on a specified computing device corresponding to each specified neural network model in the set of specified neural network models, a corresponding specified neural network model according to the plurality of operational requirements and the plurality of hardware attributes.

Optionally, the computing device comprises at least one computing carrier comprising at least one computing unit.

It can be understood that, based on the received attribute information of the M operation requests, a target computing device executing the M operation requests is selected from a plurality of computing devices included in the server, and an operation instruction corresponding to the target computing device is determined, the target computing device completes the operation request according to the operation instruction corresponding to the target computing device, and sends a final operation result corresponding to each operation request to a corresponding electronic device, that is, uniformly allocates computing resources according to the operation request, so that the plurality of computing devices in the server effectively cooperate, thereby improving the operation efficiency of the server.

In one embodiment, as shown in fig. 7, another server 700 is disclosed that includes a processor 701, a memory 702, a communication interface 703, and one or more programs 704, wherein the one or more programs 704 are stored in the memory 702 and configured to be executed by the processor, the programs 704 including instructions for performing some or all of the steps described in the scheduling method above.

In another embodiment of the invention, a computer readable storage medium is provided, the computer readable storage medium storing a computer program comprising program instructions that when executed by a processor cause the processor to perform the implementation described in the scheduling method.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working procedures of the terminal and the unit described above may refer to the corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In several embodiments provided by the present application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that, in the drawings or the text of the specification, implementations not shown or described are all forms known to those of ordinary skill in the art, and not described in detail. Furthermore, the above definitions of the elements and methods are not limited to the specific structures, shapes or modes mentioned in the embodiments, and may be simply modified or replaced by those of ordinary skill in the art.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present application, and are not meant to limit the scope of the application, but to limit the application thereto.

Claims

1. A scheduling method, the method being based on a server comprising a plurality of computing devices, the method comprising:

receiving M operation requests, wherein M is a positive integer;

selecting at least one target computing device from the plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining a computing instruction corresponding to each target computing device in the at least one target computing device;

transmitting each final operation result in the M final operation results to corresponding electronic equipment; the attribute information includes a target neural network model, the at least one target computing device is a plurality of parallel computing devices and at least one serial computing device, the selecting at least one target computing device from the plurality of computing devices according to the attribute information of each computing request in the M computing requests, and determining an operation instruction corresponding to each target computing device in the at least one target computing device, including:

If M is 1, obtaining an instruction stream of a target neural network model corresponding to the operation request;

splitting the instruction stream into a plurality of parallel instructions and a plurality of serial instructions;

selecting the plurality of parallel computing devices corresponding to the plurality of parallel instructions and the at least one serial computing device corresponding to the plurality of serial instructions from the plurality of computing devices, determining that an operation instruction of each of the plurality of parallel computing devices is a corresponding parallel instruction, and determining that an operation instruction of each of the at least one serial computing device is a corresponding serial instruction;

selecting the at least one serial computing device from the plurality of computing devices that corresponds to the plurality of serial instructions, comprising: grouping the plurality of serial instructions to obtain at least one group of serial instruction sequences; selecting a computing device from the plurality of computing devices that corresponds to each of the at least one set of serial instruction sequences to obtain the at least one serial computing device.

2. The method according to claim 1, wherein the obtaining the instruction stream of the target neural network model corresponding to the operation request includes:

Acquiring a first instruction descriptor stream according to a basic operation sequence corresponding to the target neural network model;

simplifying the first instruction descriptor stream to obtain a second instruction descriptor stream;

and acquiring the instruction stream according to the second instruction descriptor stream.

3. The method of claim 1 or 2, wherein the selecting the at least one serial computing device from the plurality of computing devices that corresponds to the plurality of serial instructions comprises:

grouping the plurality of serial instructions to obtain at least one group of serial instruction sequences;

selecting a computing device corresponding to each of the at least one set of serial instruction sequences from the plurality of computing devices, and obtaining the at least one serial computing device; the selecting the plurality of parallel computing devices corresponding to the plurality of parallel instructions from the plurality of computing devices includes:

if the operation request is a training task, grouping operation data corresponding to the operation request based on a training method corresponding to the operation request to obtain a plurality of groups of operation data;

and selecting the plurality of parallel computing devices from the plurality of computing devices according to the plurality of groups of operation data and the plurality of parallel instructions.

4. The method of claim 1, wherein the attribute information includes an operation task, wherein the selecting at least one target computing device from the plurality of computing devices according to the attribute information of each of the M operation requests, and determining an operation instruction corresponding to each of the at least one target computing device, comprises:

if the operation task of the target operation request is a test task, selecting a calculation device comprising forward operation of a target neural network model corresponding to the target operation task from the plurality of calculation devices to obtain a first target calculation device, wherein the target operation request is any one of the M operation requests, and the first target calculation device is a target calculation device corresponding to the target operation request in the at least one target calculation device;

if the operation task of the target operation request is a training task, selecting a computing device comprising forward operation and backward training of a target neural network model corresponding to the target operation task from the plurality of computing devices, and obtaining the first target computing device;

and determining an operation instruction corresponding to the first target computing device as the target operation request.

5. The method of claim 1, wherein selecting at least one target computing device from the plurality of computing devices according to the attribute information of each of the M computing requests, and determining the computing instruction corresponding to each of the at least one target computing device, comprises:

selecting an auxiliary scheduling algorithm from an auxiliary scheduling algorithm set according to the attribute information of each operation request in the M operation requests, wherein the auxiliary scheduling algorithm set comprises at least one of the following: a polling scheduling algorithm, a weighted polling algorithm, a least-linking algorithm, a weighted least-linking algorithm, a locality-based least-linking algorithm with replication, a target address hashing algorithm, a source address hashing algorithm;

and selecting the at least one target computing device from the plurality of computing devices according to the auxiliary scheduling algorithm, and determining an operation instruction corresponding to each of the at least one target computing device.

6. The method according to any one of claims 1-5, further comprising:

waiting for a first preset duration, detecting whether each target computing device in the at least one target computing device returns a final operation result of a corresponding operation instruction, and if not, taking the target computing device which does not return the final operation result as a delay computing device;

Selecting a standby computing device from the idle computing devices of the computing devices according to the operation instruction corresponding to the delay computing device;

executing an operation finger corresponding to the delay computing device through the standby computing device; after the execution of the operation instruction corresponding to the delay computing device by the standby computing device, the method further includes:

acquiring a final operation result returned firstly between the delay computing device and the standby computing device;

and sending a pause instruction to a computing device which does not return a final operation result between the delay computing device and the standby computing device.

7. The method of claim 6, wherein the method further comprises:

and waiting for a second preset time period, detecting whether the delay computing device returns a final operation result of the corresponding operation instruction, and if not, taking the delay computing device which does not return the final operation result as a fault computing device, and sending a fault instruction, wherein the fault instruction is used for informing operation and maintenance personnel that the fault computing device breaks down, and the second preset time period is longer than the first preset time period.

8. The method according to any one of claims 1-5, further comprising:

And updating the hash table of the server every other target time threshold.

9. The method according to any one of claims 1-5, further comprising:

acquiring the operation requirement of each appointed neural network model in the appointed neural network model set and the hardware attribute of each computing device in the plurality of computing devices to obtain a plurality of operation requirements and a plurality of hardware attributes;

and deploying corresponding specified neural network models on a specified computing device corresponding to each specified neural network model in the specified neural network model set according to the operation requirements and the hardware attributes.

10. A server comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-9.

11. A computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-9.