CN109976809A

CN109976809A - Dispatching method and relevant apparatus

Info

Publication number: CN109976809A
Application number: CN201711467705.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Cambrian Technology Co Ltd
Current assignee: Cambricon Technologies Corp Ltd; Beijing Zhongke Cambrian Technology Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2019-07-05
Anticipated expiration: 2037-12-28
Also published as: CN109976809B

Abstract

The embodiment of the present application discloses a kind of dispatching method and relevant apparatus, and wherein method is based on the server comprising multiple computing devices, comprising: receives operation request；Obtain the instruction stream that corresponding target nerve network model is requested in the operation；Described instruction stream is split as multiple parallel instructions and multiple serial commands；And the corresponding multiple parallel computation units of the multiple parallel instruction and at least one serial-computing machine corresponding with the multiple serial command are chosen from the multiple computing device；Corresponding operational data is requested to carry out that final operation result is calculated the operation according to the corresponding serial command of serial-computing machine each in the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and at least one described serial-computing machine；The final operation result is sent to the electronic equipment for sending the operation request.The operational efficiency of the single operation request of server process can be improved in the embodiment of the present application.

Description

Dispatching method and relevant apparatus

Technical field

This application involves field of computer technology, and in particular to a kind of dispatching method and relevant apparatus.

Background technique

Neural network is the basis of current many artificial intelligence applications, with the further expansion of the application range of neural network Greatly, various neural network models are stored using server or cloud computing service, and for the fortune that user submits It calculates request and carries out operation, how to improve the operation efficiency of server is those skilled in the art's technical problem to be solved.

Summary of the invention

The embodiment of the present application proposes a kind of dispatching method and relevant apparatus, and the computing device that can be chosen in server executes Single operation request, improves the operational efficiency of server.

In a first aspect, the embodiment of the present application provides a kind of dispatching method, it is described based on the server of multiple computing devices Method includes:

Receive operation request；

Obtain the instruction stream that corresponding target nerve network model is requested in the operation；

Described instruction stream is split as multiple parallel instructions and multiple serial commands；

Chosen from the multiple computing device corresponding with the multiple parallel instruction multiple parallel computation units and with At least one corresponding serial-computing machine of the multiple serial command；

According to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and described at least one In a serial-computing machine the corresponding serial command of each serial-computing machine to the operation request corresponding operational data into Row calculates, and obtains final operation result；

The final operation result is sent to the electronic equipment for sending the operation request.

Second aspect, the embodiment of the present application provide a kind of server, and the server includes multiple computing devices, In:

Receiving unit, for receiving operation request；

Acquiring unit requests the instruction stream of corresponding target nerve network model for obtaining the operation；

Split cells, for described instruction stream to be split as multiple parallel instructions and multiple serial commands；

Selection unit, it is corresponding with the multiple parallel instruction multiple parallel for being chosen from the multiple computing device Computing device and at least one serial-computing machine corresponding with the multiple serial command；

Arithmetic element, for according to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit Serial command corresponding with serial-computing machine each at least one described serial-computing machine is corresponding to operation request Operational data calculated, obtain final operation result；

Transmission unit, for the final operation result to be sent to the electronic equipment for sending the operation request.

The third aspect, the embodiment of the present application provide another server, including processor, memory, communication interface with And one or more programs, wherein one or more of programs are stored in the memory, and are configured by described Processor executes, and described program includes the instruction for the step some or all of as described in first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer storage medium It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor State the method that processor executes above-mentioned first aspect.

After above-mentioned dispatching method and relevant apparatus, when the single fortune for receiving electronic equipment transmission in server It when calculating request, requests the instruction stream of corresponding target nerve network model to split operation, and chooses and parallel instruction pair It is independent to choose the serial-computing machine for being good at handling serial command for the corresponding parallel instruction of parallel computation unit parallel processing answered Corresponding serial command is executed, and requests corresponding final operation result to be sent to electronic equipment the operation.That is, logical It crosses the computing device that multiple parallel computation units are concentrated and executes corresponding parallel instruction parallel, when saving the execution of parallel instruction Between, the computational efficiency between each serial command is improved by serial-computing machine.Unified distribution meter is requested according to operation Resource is calculated, so that multiple computing devices in server are effectively cooperated, to improve the integral operation efficiency of server.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Wherein:

Fig. 1 is a kind of structural schematic diagram of server provided by the embodiments of the present application；

Fig. 1 a is a kind of structural schematic diagram of computing unit provided by the embodiments of the present application；

Fig. 1 b is a kind of structural schematic diagram of main process task circuit provided by the embodiments of the present application；

Fig. 1 c is a kind of data distribution schematic diagram of computing unit provided by the embodiments of the present application；

Fig. 1 d is a kind of data back schematic diagram of computing unit provided by the embodiments of the present application；

Fig. 1 e is a kind of operation schematic diagram of neural network structure provided by the embodiments of the present application；

Fig. 2 is a kind of flow diagram of dispatching method provided by the embodiments of the present application；

Fig. 3 is the structural schematic diagram of another server provided by the embodiments of the present application；

Fig. 4 is the structural schematic diagram of another server provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment And be not intended to limit the application.As present specification and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in present specification and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".

The embodiment of the present application proposes a kind of dispatching method and relevant apparatus, and the computing device that can be chosen in server executes Operation request, improves the operational efficiency of server.It is further to the application below in conjunction with specific embodiment, and referring to attached drawing It is described in detail.

Fig. 1 is please referred to, Fig. 1 is a kind of structural schematic diagram of server provided by the embodiments of the present application.On as shown in Figure 1, Stating server includes multiple computing devices, and computing device includes but is not limited to server computer, can also be personal computer (personal computer, PC), network PC, minicomputer, mainframe computer etc..

In this application, it establishes connection by wired or wireless between each computing device for including in server and transmits Data, and each computing device includes at least one calculating carrier, such as: central processing unit (Central Processing Unit, CPU), image processor (graphics processing unit, GPU), processor board etc..And involved by the application And server can also be Cloud Server, provide cloud computing service for electronic equipment.

Wherein, each carrier that calculates may include the computing unit that at least one is used for neural network computing, such as: processing core Piece etc..The specific structure of computing unit is not construed as limiting, Fig. 1 a is please referred to, Fig. 1 a is a kind of structural representation of computing unit Figure.As shown in Figure 1a, which includes: main process task circuit, basic handling circuit and branch process circuit.Specifically, main Processing circuit and branch process circuit connection, at least one basic handling circuit of branch process circuit connection.

The branch process circuit, for receiving and dispatching the data of main process task circuit or basic handling circuit.

B refering to fig. 1, Fig. 1 b are a kind of structural schematic diagram of main processing circuit, and as shown in Figure 1 b, main process task circuit can wrap Register and/or on piece buffer circuit are included, which can also include: control circuit, vector operation device circuit, ALU (arithmetic and logic unit, arithmetic logic circuit) circuit, accumulator circuit, DMA (Direct Memory Access, direct memory access) circuits such as circuit, certainly in practical applications, above-mentioned main process task circuit can also add, conversion Circuit (such as matrix transposition circuit), data rearrangement circuit or active circuit etc. others circuit.

Main process task circuit further includes data transmitting line, data receiver circuit or interface, which can collect At data distribution circuit and data broadcasting circuit, certainly in practical applications, data distribution circuit and data broadcasting circuit It can also be respectively set；Above-mentioned data transmitting line and data receiver circuit also can integrate shape together in practical applications At data transmit-receive circuit.For broadcast data, that is, need to be sent to the data of each based process circuit.For distributing data, Need selectively to be sent to the data of part basis processing circuit, specific selection mode can be by main process task circuit foundation Load and calculation are specifically determined.For broadcast transmission mode, i.e., broadcast data is sent to the forms of broadcasting Each based process circuit.(broadcast data in practical applications, is sent to each based process by way of once broadcasting Broadcast data can also be sent to each based process circuit, the application specific implementation by way of repeatedly broadcasting by circuit Mode is not intended to limit the number of above-mentioned broadcast), for distributing sending method, i.e., distribution data are selectively sent to part base Plinth processing circuit.

Realizing that the control circuit of main process task circuit is to some or all of based process circuit transmission number when distributing data According to (data may be the same or different, specifically, if sending data by the way of distribution, each reception data The data that based process circuit receives can be different, naturally it is also possible to which the data for having part basis processing circuit to receive are identical；

Specifically, when broadcast data, the control circuit of main process task circuit is to some or all of based process circuit transmission Data, each based process circuit for receiving data can receive identical data, i.e. broadcast data may include all bases Processing circuit is required to the data received.Distribution data may include: the data that part basis processing circuit needs to receive. The broadcast data can be sent to all branch process circuits, branch process electricity by one or many broadcast by main process task circuit The road broadcast data is transmitted to all based process circuits.

Optionally, the vector operation device circuit of above-mentioned main process task circuit can execute vector operation, including but not limited to: two A vector addition subtraction multiplication and division, vector and constant add, subtract, multiplication and division operation, or executes any operation to each element in vector. Wherein, continuous operation is specifically as follows, and vector and constant add, subtract, multiplication and division operation, activating operation, accumulating operation etc..

Each based process circuit may include base register and/or basic on piece buffer circuit；Each based process Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.

The connection structure of branch process circuit and tandem circuit can be arbitrary, and be not limited to the H-type structure of Fig. 1 b.It can Choosing, main process task circuit to tandem circuit is the structure of broadcast or distribution, and tandem circuit to main process task circuit is to collect (gather) structure.Broadcast, distribution and collection are defined as follows:

The data transfer mode of the main process task circuit to tandem circuit may include:

Main process task circuit is respectively connected with multiple branch process circuits, each branch process circuit again with multiple tandem circuits It is respectively connected with.

Main process task circuit is connected with a branch process circuit, which reconnects a branch process electricity Road, and so on, multiple branch process circuits of connecting, then, each branch process circuit distinguish phase with multiple tandem circuits again Even.

Main process task circuit is respectively connected with multiple branch process circuits, and each branch process circuit is connected multiple basic electric again Road.

Main process task circuit is connected with a branch process circuit, which reconnects a branch process electricity Road, and so on, multiple branch process circuits of connecting, then, each branch process circuit is connected multiple tandem circuits again.

When distributing data, main process task circuit transmits data, each base for receiving data to some or all of tandem circuit The data that plinth circuit receives can be different；

When broadcast data, main process task circuit transmits data, each base for receiving data to some or all of tandem circuit Plinth circuit receives identical data.

When collecting data, part or all of tandem circuit is to main process task circuit transmission data.It should be noted that such as Fig. 1 a institute The computing unit shown can be an individual phy chip, and certainly in practical applications, which also can integrate In other chips (such as CPU, GPU), the application specific embodiment is not intended to limit the physical behavior shape of said chip device Formula.

C refering to fig. 1, Fig. 1 c are a kind of data distribution schematic diagram of computing unit, and as shown in the arrow of Fig. 1 c, which is The distribution direction of data, as illustrated in figure 1 c, after main process task circuit receives external data, after external data is split, point Multiple branch process circuits are sent to, branch process circuit is sent to basic handling circuit for data are split.

D refering to fig. 1, Fig. 1 d are a kind of data back schematic diagram of computing unit, and as shown in the arrow of Fig. 1 d, which is Data (such as inner product calculated result) is returned to branch process by the upstream direction of data, as shown in Figure 1 d, basic handling circuit Circuit, branch process circuit are being back to main process task circuit.

It can be specifically vector, matrix, multidimensional (three-dimensional or four-dimensional or more) data, for defeated for input data A specific value for entering data, is properly termed as an element of the input data.

Present disclosure embodiment also provides a kind of calculation method of computing unit as shown in Figure 1a, the calculation method apply with In neural computing, specifically, the computing unit can be used for the input data to one or more layers in multilayer neural network Operation is executed with weight data.

Specifically, computing unit described above is used for the input data to one or more layers in trained multilayer neural network Operation is executed with weight data；

Or the computing unit is used for the input data and power to one or more layers in the multilayer neural network of forward operation Value Data executes operation.

Above-mentioned operation includes but is not limited to: convolution algorithm, Matrix Multiplication matrix operation, Matrix Multiplication vector operation, biasing operation, One of full connection operation, GEMM operation, GEMV operation, activation operation or any combination.

GEMM calculating refers to: the operation of the matrix-matrix multiplication in the library BLAS.The usual representation of the operation are as follows: C= Alpha*op (S) * op (P)+beta*C, wherein S and P is two matrixes of input, and C is output matrix, and alpha and beta are Scalar, op represents certain operation to matrix S or P, in addition, also having the integer of some auxiliary as a parameter to illustrating matrix The width of S and P is high；

GEMV calculating refers to: the operation of the Matrix-Vector multiplication in the library BLAS.The usual representation of the operation are as follows: C= Alpha*op (S) * P+beta*C, wherein S is input matrix, and P is the vector of input, and C is output vector, and alpha and beta are Scalar, op represent certain operation to matrix S.

The application is not construed as limiting for calculating the connection relationship between carrier in computing device, can be isomorphism or isomery Carrier is calculated, is also not construed as limiting for calculating the connection relationship in carrier between computing unit, is carried by the calculating of above-mentioned isomery Body or computing unit execute parallel task, and operation efficiency can be improved.

Computing device as described in Figure 1 includes that at least one calculates carrier, wherein calculate carrier includes at least one meter again Unit is calculated, i.e., selected target computing device depends on the connection relationship between each computing device and each meter in the application Calculate the attribute letter that the specific physical hardware such as neural network model, Internet resources disposed in device supports situation and operation request The calculating carrier of same type, then can be deployed in the same computing device, such as the calculating carrier portion that will be used for propagated forward by breath It is deployed on the same computing device, without being different computing device, effectively reduces the expense communicated between computing device, just In raising operation efficiency；Specific neural network model can also be deployed in specific calculating carrier, i.e. server is receiving needle When requesting the operation of specified neural network, the corresponding calculating carrier of above-mentioned specified neural network is called to execute above-mentioned operation request , the time of determining processing task is saved, operation efficiency is improved.

In this application, will be disclosed, and the neural network model used extensively is as specified neural network model (example Such as: LeNet, AlexNet, ZFnet in convolutional neural networks (convolutional neural network, CNN), GoogleNet、VGG、ResNet)。

Optionally, it obtains specified neural network model and concentrates the operation demand of each specified neural network model and described more The hardware attributes of each computing device obtain multiple operation demands and multiple hardware attributes in a computing device；According to the multiple Operation demand and the multiple hardware attributes are corresponding by the specified each specified neural network model of neural network model concentration Specified computing device on dispose corresponding specified neural network model.

Wherein, specifying neural network model collection includes multiple specified neural network models, the hardware attributes packet of computing device Network bandwidth, memory capacity, the processor host frequency rate etc. for including computing device itself further include that carrier or meter are calculated in computing device Calculate the hardware attributes of unit.That is, according to the selection of the hardware attributes of each computing device and specified neural network model The corresponding computing device of operation demand, can avoid processing leads to server failure not in time, and energy is supported in the operation for improving server Power.

The input neuron and output neuron mentioned in the application do not mean that refreshing in the input layer of entire neural network Through neuron in member and output layer, but the mind for two layers of arbitrary neighborhood in network, in network feed forward operation lower layer It is input neuron through member, the neuron in network feed forward operation upper layer is output neuron.With convolutional Neural net For network, if a convolutional neural networks have L layers, K=1,2 ..., L-1, for K layers and K+1 layers, by K layers Referred to as input layer, neuron therein are the input neuron, and K+1 layers are known as output layer, and neuron therein is described Output neuron.I.e. in addition to top, each layer all can serve as input layer, and next layer is corresponding output layer.

The operation being mentioned above all is one layer in neural network of operation, for multilayer neural network, realizes process As shown in fig. le, the arrow of dotted line indicates reversed operation in figure, and the arrow of solid line indicates forward operation.In forward operation, when Upper one layer of artificial neural network executes complete after, using upper one layer of obtained output neuron as next layer of input neuron It carries out operation (or the input neuron that certain operations are re-used as next layer is carried out to the output neuron), meanwhile, it will weigh Value also replaces with next layer of weight.In reversed operation, after the completion of the reversed operation of upper one layer of artificial neural network executes, The input neuron gradient that upper one layer obtains is subjected to operation (or to the input as next layer of output neuron gradient Neuron gradient carries out the output neuron gradient that certain operations are re-used as next layer), while weight is replaced with next layer Weight.

The forward operation of neural network is that input data is input to the calculating processes of final output data, reversed operation with just It is to the direction of propagation of operation on the contrary, anti-for final output data and the loss of desired output data or the corresponding loss function of loss To the calculating process by forward operation.By information forward operation and reversed operation in cycles, according to loss or loss Functional gradient decline mode correct each layer weight, each layer weight is adjusted and neural network learning training process, The loss of network output can be reduced.

Fig. 2 is referred to, Fig. 2 is a kind of flow diagram of dispatching method provided by the embodiments of the present application, as shown in Fig. 2, This method is applied to server as shown in Figure 1, and this method is related to the above-mentioned electronic equipment for allowing to access above-mentioned server, should Electronic equipment may include the various handheld devices with wireless communication function, mobile unit, wearable device, calculate equipment or Other processing equipments and various forms of user equipmenies (user equipment, UE) of radio modem are connected to, Mobile station (mobile station, MS), terminal device (terminal device) etc..

201: receiving operation request.

In this application, server receives the operation request that the electronic equipment for allowing to access is sent.

Operation request includes target nerve network mould involved in processor active task (training mission or test assignment), operation The attribute informations such as type.Wherein, training mission is for being trained target nerve network model, i.e., to the neural network model into Row forward operation and reversed operation, until training is completed；And test assignment is used to be carried out once according to target nerve network model Forward operation.

Above-mentioned target nerve network model can be user and send the nerve net uploaded when operation request by electronic equipment Network model is also possible to the neural network model stored in server etc., number of the application for target nerve network model Amount is also not construed as limiting, i.e., each operation request can correspond at least one target nerve network model.

202: obtaining the instruction stream that corresponding target nerve network model is requested in the operation.

Wherein, instruction stream specifies the order of operation and the corresponding instruction of each sequence of target nerve network model, that is, refers to Sequence is enabled, the operation of target nerve network model can be achieved by above-metioned instruction stream.Each target nerve network model corresponding one A basic operation sequence obtains the data knot of description target nerve network model operation by parsing target nerve network model Structure.The application is not construed as limiting the resolution rules between basic operation sequence and instruction description symbol, according to basic operation sequence Resolution rules between instruction description symbol obtain the corresponding instruction description symbol stream of target nerve network model.

The application is also not construed as limiting the preset format of each instruction description symbol stream in instruction description symbol stream.According to default The network structure of format produces instruction description symbol and flows corresponding instruction.Above-metioned instruction includes the institute in cambricon instruction set There are instruction, such as the instruction of matrix operation command, convolution algorithm, the instruction of full articulamentum forward operation, pond operational order, normalization Instruction, vector operation instruction and scalar operation instruction.

Optionally, it includes: according to institute that the instruction stream of corresponding target nerve network model is requested in the acquisition operation State corresponding the first instruction description of the basic operation retrieval symbol stream of target nerve network model；First instruction description is accorded with Stream is simplified to obtain the second instruction description symbol stream；Stream, which is accorded with, according to second instruction description obtains described instruction stream.

That is, eliminating finger extra in the first instruction description symbol stream by the simplification to the first instruction description symbol stream Descriptor is enabled, so as to shorten instruction stream.Stream, which is accorded with, further according to the second instruction description obtains the instruction stream that can be executed by a computing apparatus, Output data is obtained according to instruction and input data operation, is overcome with the fine-grained atomic operation group such as convolution, Chi Hua, activation At intact nervous network carry out operation when generation redundancy input, output or other operation, to further improve clothes The arithmetic speed of business device.

It should be noted that if the corresponding multiple target nerve network models of operation request, then need to obtain multiple target minds Instruction stream through network model, then split, to complete operation request.

203: described instruction stream is split as multiple parallel instructions and multiple serial commands.

The application is not construed as limiting for how to split instruction stream, parallel instruction be can distribute to multiple computing devices simultaneously into The instruction of row operation, and serial command is that the instruction of operation can only be completed by single computing device.Such as: video identification, understanding Equal operations request generally comprises feature extraction instruction and feature identification instruction, and wherein feature extraction instruction is needed to continuous several frames Image carries out process of convolution, and feature identification instruction, which is generally required, carries out Recognition with Recurrent Neural Network to the feature that feature extraction instructs Calculating.Above-mentioned feature extraction instruction can distribute to multiple computing devices, and feature identification instruction can only be by individually calculating dress It sets and is handled.

204: multiple parallel computation units corresponding with the multiple parallel instruction are chosen from the multiple computing device At least one serial-computing machine corresponding with the multiple serial command.

Instruction stream is split to the multiple meters for obtaining multiple parallel instructions and multiple serial commands, including from server It calculates to choose in device and executes the corresponding parallel computation unit of each parallel instruction serial meter corresponding with multiple serial commands are executed Device is calculated, to obtain multiple parallel computation units and at least one serial-computing machine, and the operation of each parallel computation unit Instruction is the corresponding parallel instruction of the parallel computation unit, and the operational order of each serial-computing machine is corresponding serial finger It enables.

The application is not construed as limiting the choosing method of serial-computing machine and parallel computation unit.

Optionally, if the processor active task of operation request is training mission, it is based on the corresponding instruction of the training mission Practice method and requests corresponding operational data to be grouped to obtain multiple groups operational data the operation；According to the multiple groups operand The multiple parallel computation unit is chosen from the multiple computing device according to the multiple parallel instruction.

Corresponding operational data is requested to be grouped to obtain multiple groups operational data operation according to specific training method, it can To be grouped with the data type of operational data, operational data can also be divided into multiple groups, it is not limited here.After grouping The suitable computing device of reselection carries out concurrent operation, that is, the operand of each computing device is further reduced, to improve Operation efficiency.

Such as: for batch gradient descent algorithm (Batch Gradient Descent, BGD), it is believed that there are an instructions Practice collection (batch), a batch can be divided for multiple subsets, and distribute to multiple computing devices, wherein each computing device A subset is trained, each subset is one group of operational data；For stochastic gradient descent algorithm (Stochastic Gradient Descent, SGD), it is believed that only one operational data in each training set (batch), it can be by different instructions Practice collection and distributes to different computing devices；It, can will be each for small lot gradient descent algorithm (mini-batch SGD) The different data of batch distributes to different computing devices to calculate, and each batch can also be divided for smaller subset, then Different computing devices is distributed to be calculated.

In this application, serial-computing machine can be a computing device or multiple meters in multiple parallel computation units Device is calculated, other idle computing devices etc. are also possible to.

The biggish feelings of estimated performance gap for the corresponding part of serial command, and in serial command between each section Condition, it is optionally, described that at least one described string corresponding with the multiple serial command is chosen from the multiple computing device Row computing device, comprising: the multiple serial command is grouped to obtain at least one set of serial instruction sequence；From the multiple Computing device corresponding with every group of serial instruction sequence at least one set of serial instruction sequence is chosen in computing device to obtain At least one described serial-computing machine.

Multiple serial commands are grouped to obtain multiple groups instruction sequence, reselection calculating corresponding with every group of instruction sequence Device carries out operation, and the computing device by being good at processing executes corresponding instruction, improves the operation efficiency of each section, thus Improve integral operation efficiency.

With fast area convolutional neural networks (faster region-based convolution neural Network, Faster R-CNN) for, Faster R-CNN is by convolutional layer, candidate region neural network (region Proposal network, RPN) layer and interest pool area (region of interest pooling, ROI pooling) Layer composition, and the estimated performance between each layer has a long way to go, then can be deployed in convolutional layer and RPN and be good at handling convolution On neural computing device, and ROI pooling can be then deployed on the neural computing device for being good at handling convolution, Such as more generally applicable processor CPU, then the operation efficiency of each section is improved, to improve integral operation efficiency.

Optionally, if the processor active task of operation request is test assignment, packet is chosen from the multiple computing device The computing device for including the forward direction operation of the target nerve network model obtains multiple target computing devices；If the processor active task For training mission, forward direction operation and backward instruction including the target nerve network model are chosen from the multiple computing device Experienced computing device obtains the multiple target computing device；From the multiple target computing device choose with it is the multiple simultaneously Row instructs corresponding multiple parallel computation units and at least one serial-computing machine corresponding with the multiple serial command.

That is, multiple target computing devices are that can be used for executing if the processor active task of operation request is test assignment The computing device of the forward direction operation of target nerve network model；And when processor active task is training mission, multiple targets calculate dress It is set to the forward direction operation that can be used for performance objective neural network model and computing device trained backward, i.e., is filled by dedicated computing Setting processing operation request can be improved the accuracy rate and operation efficiency of operation.

It for example, include the first computing device and the second computing device in server, wherein the first computing device only wraps Containing for specifying the forward direction operation of neural network model, the second computing device can both execute above-mentioned specified neural network model Forward direction operation, and the backward trained operation of above-mentioned specified neural network model can be executed.When the target operation request received In target nerve network model be above-mentioned specified neural network model, and processor active task be test assignment when, determine the first meter It calculates device and executes above-mentioned target operation request.

Optionally, auxiliary dispatching algorithm is chosen from auxiliary dispatching set of algorithms according to the attribute information that the operation is requested； The multiple parallel computation unit and described at least one are chosen from the multiple computing device according to the auxiliary dispatching algorithm A serial-computing machine.

Wherein, auxiliary dispatching set of algorithms includes but is not limited to the next item down: polling dispatching (Round-Robin Scheduling) algorithm, weighted polling (Weighted Round Robin) algorithm, minimum link (Least Connections) algorithm, minimum link (the Weighted Least Connections) algorithm of weighting, based on locality most Few link (Locality-Based Least Connections) algorithm, tape copy are at least linked based on locality (Locality-Based Least Connections with Replication) algorithm, destination address hash (Destination Hashing) algorithm, source address hash (Source Hashing) algorithm.

The application is not construed as limiting for how to choose auxiliary dispatching algorithm according to attribute information, for example, if multiple mesh It marks computing device and handles operation request of the same race, then auxiliary dispatching algorithm can be polling dispatching algorithm；If different targets calculates The anti-pressure ability of device is different, it should which the target computing device high to configuration, load is low distributes more operations and requests, then assists Dispatching algorithm can be Weighted Round Robin；If the workload that each target computing device is assigned in multiple target computing devices It is not quite similar, then auxiliary dispatching algorithm can be minimum chained scheduling algorithm, dynamically chooses and wherein currently overstocks connection number most A few target computing device handles current request, improves the utilization efficiency of target computing device as much as possible, can also To be the minimum chained scheduling algorithm of weighting.

That is, on the basis of the dispatching method as involved in above-described embodiment, in conjunction with auxiliary dispatching algorithm picks The final computing device for executing operation request, to further increase the operation efficiency of server.

205: according to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and it is described extremely The corresponding serial command of each serial-computing machine requests corresponding operand to the operation in a few serial-computing machine According to carrying out that final operation result is calculated.

The application requests corresponding operational data to be not construed as limiting each operation, can be the image for image recognition Data are also possible to the voice data etc. for speech recognition；When processor active task is test assignment, operational data is on user The data of biography, and when processor active task is training mission, operational data can be the training set of user's upload, be also possible to service The training set corresponding with target nerve network model stored in device.

It can produce multiple intermediate calculation results in the calculating process of operational order, can be obtained according to multiple intermediate calculation results Corresponding final operation result is requested in operation.

206: the final operation result is sent to the electronic equipment for sending the operation request.

It is appreciated that operation is requested corresponding when receiving the single operation request that electronic equipment is sent in server The instruction stream of target nerve network model is split, and chooses parallel computation unit parallel processing pair corresponding with parallel instruction The parallel instruction answered chooses the serial-computing machine for being good at handling serial command and corresponding serial command is individually performed, and should Operation requests corresponding final operation result to be sent to electronic equipment.That is, concentrated by multiple parallel computation units Computing device executes corresponding parallel instruction parallel, saves the execution time of parallel instruction, is improved by serial-computing machine Computational efficiency between each serial command.Unified distribution computing resource is requested according to operation, so that more in server A computing device is effectively cooperated, to improve the integral operation efficiency of server.

Optionally, the method also includes: wait the first preset duration, detect the multiple parallel computation unit and described Whether each computing device obtains corresponding final operation result at least one serial-computing machine, if it is not, described will not obtain To final operation result computing device as Delay computing device；It instructs according to the Delay computing device is corresponding from described Spare computing device is chosen in the idle computing device of multiple computing devices.

That is, the computing device of final operation result will do not obtained as delay when the first preset duration reaches Computing device chooses spare computing device according to the instruction that Delay computing device executes, to improve from idle computing device Operation efficiency.

Optionally, it is described by the spare computing device execute the corresponding operational order of the Delay computing device it Afterwards, the method also includes: obtain the final fortune obtained at first between the Delay computing device and the spare computing device Calculate result；It is sent out to the computing device for not obtaining final operation result between the Delay computing device and the spare computing device Send pause instruction.

Wherein, pause instruction, which is used to indicate between Delay computing device and spare computing device, does not obtain final operation result Computing device pause execute corresponding operational order.That is, executing Delay computing device pair by spare computing device The instruction answered, and choose and obtain final operation result between spare computing device and Delay computing device at first as operational order Corresponding final operation result, and to by not obtaining final operation result between Delay computing device and spare computing device Computing device sends pause instruction, that is, suspends the operation for not completing the computing device of operational order, to save power consumption.

Optionally, the method also includes: wait the second preset duration, detect the Delay computing device and whether obtain pair The final operation result answered, if it is not, using the Delay computing device for not obtaining final operation result as calculation of fault device, Send faulting instruction.

Wherein, for faulting instruction for informing that operation maintenance personnel calculation of fault device breaks down, the second preset duration is greater than institute State the first preset duration.That is, the second preset duration reach when, if do not receive Delay computing device obtain it is final Operation result then judges that executing Delay computing device breaks down, and informs corresponding operation maintenance personnel, to improve the place of failure Reason ability.

Optionally, the method also includes: every object time threshold value, update the hash table of the multiple computing device.

Wherein, hash table (Hash table, be also Hash table), be according to key value (Key value) and directly into The data structure of row access.In this application, using the IP address of multiple computing devices as key value, pass through hash function The position that (mapping function) maps that in hash table can quickly be found that is, after determining target computing device The physical resource that target computing device is distributed.The concrete form of hash table is not construed as limiting, can be artificially be arranged it is quiet The hash table of state is also possible to the hardware resource distributed according to IP address.Every object time threshold value, to multiple computing devices Hash table is updated, and improves the accuracy and search efficiency of lookup.

Consistent with the embodiment of above-mentioned Fig. 2, referring to figure 3., Fig. 3 is the structure of another server provided herein Schematic diagram, above-mentioned server include multiple computing devices.As shown in figure 3, above-mentioned server 300 includes:

Receiving unit 301, for receiving operation request；

Acquiring unit 302 requests the instruction stream of corresponding target nerve network model for obtaining the operation；

Split cells 303, for described instruction stream to be split as multiple parallel instructions and multiple serial commands；

Selection unit 304, it is corresponding with the multiple parallel instruction multiple for being chosen from the multiple computing device Parallel computation unit and at least one serial-computing machine corresponding with the multiple serial command；

Arithmetic element 305, for corresponding parallel according to parallel computation unit each in the multiple parallel computation unit Serial command corresponding with serial-computing machine each at least one described serial-computing machine is instructed to request the operation Corresponding operational data is calculated, and final operation result is obtained；

Transmission unit 306, for the final operation result to be sent to the electronic equipment for sending the operation request.

Optionally, the acquiring unit 302 is specifically used for according to the corresponding basic operation of the target nerve network model The first instruction description of retrieval symbol stream；First instruction description symbol stream is simplified, the second instruction description symbol stream is obtained； Stream, which is accorded with, according to second instruction description obtains described instruction stream.

Optionally, the split cells 303 is specifically used for being grouped the multiple serial command, obtains at least one set Serial instruction sequence；The selection unit 304 is specifically used for choosing from the multiple computing device and at least one set of string The corresponding computing device of every group of serial instruction sequence in row instruction sequence obtains at least one described serial-computing machine.

Optionally, it if the processor active task that the split cells 303 is specifically used for operation request is training mission, is based on The operation requests corresponding training method to request corresponding operational data to be grouped the operation, obtains multiple groups operand According to；The selection unit 304 is specifically used for according to the multiple groups operational data and the multiple parallel instruction from the multiple meter It calculates in device and chooses the multiple parallel computation unit.

Optionally, if the processor active task that the selection unit 304 is specifically used for operation request is test assignment, from institute The computing device for choosing the forward direction operation including the target nerve network model in multiple computing devices is stated, multiple targets are obtained Computing device；If the processor active task is training mission, choosing from the multiple computing device includes the target nerve net The forward direction operation of network model and computing device trained backward, obtain the multiple target computing device；From the multiple target The multiple parallel computation unit and at least one described serial-computing machine are chosen in computing device.

Optionally, the selection unit 304 is specifically used for the attribute information requested according to the operation from auxiliary dispatching calculation Method, which is concentrated, chooses auxiliary dispatching algorithm, and the auxiliary dispatching set of algorithms includes at least one of the following: polling dispatching algorithm, weighting wheel Ask algorithm, at least link algorithm, the minimum link algorithm of weighting, Locality-Based Least Connections Scheduling algorithm, tape copy based on office Portion's property at least links algorithm, destination address hashing algorithm, source address hashing algorithm；According to the auxiliary dispatching algorithm from described more The multiple parallel computation unit and at least one described serial-computing machine are chosen in a computing device.

Optionally, the server further includes detection unit 307, for waiting the first preset duration, is detected the multiple Whether parallel computation unit and at least one described serial-computing machine obtain corresponding final operation result, if it is not, will be described The computing device of final operation result is not obtained as Delay computing device；By the selection unit 304 according to the delay meter It calculates the corresponding instruction of device and chooses spare computing device from the idle computing device of the multiple computing device；By the operation Unit 305 executes the corresponding instruction of the Delay computing device by the spare computing device.

Optionally, the acquiring unit 302, be also used to obtain the Delay computing device and the spare computing device it Between the final operation result that obtains at first；It is filled from the transmission unit 306 to the Delay computing device and the spare calculating The computing device for not obtaining final operation result between setting sends pause instruction.

Optionally, the detection unit 307 is also used to wait the second preset duration, and detecting the Delay computing device is It is no to obtain corresponding final operation result, if it is not, using the Delay computing device for not obtaining final operation result as failure Computing device；Faulting instruction is sent by the transmission unit 304, the faulting instruction is by informing based on failure described in operation maintenance personnel It calculates device to break down, second preset duration is greater than first preset duration.

Optionally, the server further includes updating unit 308, for updating the service every object time threshold value The hash table of device.

Optionally, the acquiring unit 302 is also used to obtain specified neural network model and concentrates each specified neural network The hardware attributes of each computing device obtain multiple operation demands and more in the operation demand of model and the multiple computing device A hardware attributes；

The server further includes deployment unit 309, for according to the multiple operation demand and the multiple hardware category Property the specified neural network model concentrated to dispose on the corresponding specified computing device of each specified neural network model correspond to Specified neural network model.

Optionally, the computing device includes that at least one calculates carrier, and the calculating carrier includes at least one calculating Unit.

In one embodiment, as shown in figure 4, including processor 401 this application discloses another server 400, depositing Reservoir 402, communication interface 403 and one or more programs 404, wherein one or more programs 404 are stored in memory In 402, and it is configured to be executed by processor, described program 404 includes for executing portion described in above-mentioned dispatching method Point or Overall Steps instruction.

A kind of computer readable storage medium, above-mentioned computer-readable storage medium are provided in another embodiment of the invention Matter is stored with computer program, and above-mentioned computer program includes program instruction, and above procedure instruction makes when being executed by a processor Above-mentioned processor executes implementation described in dispatching method.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.

It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description The specific work process at end and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

In several embodiments provided herein, it should be understood that disclosed terminal and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of said units, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.In addition, shown or discussed phase Mutually between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication of device or unit Connection is also possible to electricity, mechanical or other form connections.

Above-mentioned unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

If above-mentioned integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment above method of the present invention Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. are various can store program The medium of code.

It should be noted that in attached drawing or specification text, the implementation for not being painted or describing is affiliated technology Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and method is simultaneously It is not limited only to various specific structures, shape or the mode mentioned in embodiment, those of ordinary skill in the art can carry out letter to it It singly changes or replaces.

Above specific embodiment has carried out further specifically the purpose of the application, technical scheme and beneficial effects It is bright, it should be understood that the above is only the specific embodiments of the application, are not intended to limit this application, all the application's Within spirit and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of protection of this application.

Claims

1. a kind of dispatching method, which is characterized in that the method is based on the server comprising multiple computing devices, the method packet It includes:

Receive operation request；

Chosen from the multiple computing device corresponding with the multiple parallel instruction multiple parallel computation units and with it is described At least one corresponding serial-computing machine of multiple serial commands；

According to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and at least one described string The corresponding serial command of each serial-computing machine requests corresponding operational data to be counted the operation in row computing device It calculates, obtains final operation result；

2. the method according to claim 1, wherein described obtain the corresponding target nerve net of the operation request The instruction stream of network model, comprising:

According to corresponding the first instruction description of the basic operation retrieval symbol stream of the target nerve network model；

First instruction description symbol stream is simplified, the second instruction description symbol stream is obtained；

Stream, which is accorded with, according to second instruction description obtains described instruction stream.

3. according to the method described in claim 2, it is characterized in that, it is described from the multiple computing device choose with it is described more At least one corresponding described serial-computing machine of a serial command, comprising:

The multiple serial command is grouped, at least one set of serial instruction sequence is obtained；

It is corresponding with every group of serial instruction sequence at least one set of serial instruction sequence from being chosen in the multiple computing device Computing device, obtain at least one described serial-computing machine.

4. method according to claim 1-3, which is characterized in that it is characterized in that, described from the multiple meter It calculates and chooses the multiple parallel computation unit corresponding with the multiple parallel instruction in device, comprising:

If the processor active task of the operation request is training mission, request corresponding training method to the fortune based on the operation It calculates and corresponding operational data is requested to be grouped, obtain multiple groups operational data；

According to the multiple groups operational data and the multiple parallel instruction chosen from the multiple computing device it is the multiple simultaneously Row computing device.

5. the method according to claim 1, wherein it is described from the multiple computing device choose with it is described more A corresponding multiple parallel computation units of parallel instruction and at least one serial computing corresponding with the multiple serial command dress It sets, comprising:

If the processor active task of the operation request is test assignment, choosing from the multiple computing device includes the target mind The computing device of forward direction operation through network model, obtains multiple target computing devices；

If the processor active task is training mission, choosing from the multiple computing device includes the target nerve network model Forward direction operation and computing device trained backward, obtain the multiple target computing device；

The multiple parallel computation unit and at least one described serial computing dress are chosen from the multiple target computing device It sets.

6. the method according to claim 1, wherein it is described from the multiple computing device choose with it is described more A corresponding multiple parallel computation units of parallel instruction and at least one serial computing corresponding with the multiple serial command dress It sets, comprising:

Auxiliary dispatching algorithm, the auxiliary dispatching are chosen from auxiliary dispatching set of algorithms according to the attribute information that the operation is requested Set of algorithms includes at least one of the following: polling dispatching algorithm, Weighted Round Robin, at least links algorithm, the minimum link calculation of weighting Method, Locality-Based Least Connections Scheduling algorithm, tape copy at least linked based on locality algorithm, destination address hashing algorithm, Source address hashing algorithm；

According to the auxiliary dispatching algorithm chosen from the multiple computing device the multiple parallel computation unit and it is described to A few serial-computing machine.

7. method according to claim 1-6, which is characterized in that the method also includes:

The first preset duration is waited, is detected each in the multiple parallel computation unit and at least one described serial-computing machine Whether computing device obtains corresponding final operation result, if it is not, the computing device for not obtaining final operation result is made For Delay computing device；

It is chosen from the idle computing device of the multiple computing device according to the corresponding instruction of the Delay computing device spare Computing device；

The corresponding instruction of the Delay computing device is executed by the spare computing device.

8. the method according to the description of claim 7 is characterized in that described by prolonging described in the spare computing device execution After the corresponding instruction of slow computing device, the method also includes:

Obtain the final operation result obtained at first between the Delay computing device and the spare computing device；

It is sent to the computing device for not obtaining final operation result between the Delay computing device and the spare computing device Pause instruction.

9. method according to claim 7 or 8, which is characterized in that the method also includes:

The second preset duration is waited, detects whether the Delay computing device obtains corresponding final operation result, if it is not, by institute The Delay computing device for not obtaining final operation result is stated as calculation of fault device, sends faulting instruction, the faulting instruction For informing that calculation of fault device described in operation maintenance personnel breaks down, when second preset duration is default greater than described first It is long.

10. -9 described in any item methods according to claim 1, which is characterized in that the method also includes:

Every object time threshold value, the hash table of the server is updated.

11. -10 described in any item methods according to claim 1, which is characterized in that the method also includes:

Obtain the operation demand for specifying neural network model to concentrate each specified neural network model and the multiple computing device In the hardware attributes of each computing device obtain multiple operation demands and multiple hardware attributes；

The specified neural network model is concentrated into each specify according to the multiple operation demand and the multiple hardware attributes Corresponding specified neural network model is disposed on the corresponding specified computing device of neural network model.

12. -11 described in any item methods according to claim 1, which is characterized in that the computing device includes at least one meter Carrier is calculated, the calculating carrier includes at least one computing unit.

13. a kind of server, which is characterized in that the server includes multiple computing devices, the server further include: be used for Execute the unit such as the described in any item methods of claim 1-12.

14. a kind of server, which is characterized in that including processor, memory, communication interface and one or more program, In, one or more of programs are stored in the memory, and are configured to be executed by the processor, described program Include the steps that requiring the instruction in any one of 1-12 method for perform claim.

15. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, the calculating Machine program includes program instruction, and described program instruction makes the processor execute such as claim 1-12 when being executed by a processor Described in any item methods.