CN109976809A - Dispatching method and relevant apparatus - Google Patents
Dispatching method and relevant apparatus Download PDFInfo
- Publication number
- CN109976809A CN109976809A CN201711467705.4A CN201711467705A CN109976809A CN 109976809 A CN109976809 A CN 109976809A CN 201711467705 A CN201711467705 A CN 201711467705A CN 109976809 A CN109976809 A CN 109976809A
- Authority
- CN
- China
- Prior art keywords
- computing device
- instruction
- serial
- computing
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 124
- 210000005036 nerve Anatomy 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 47
- 238000003062 neural network model Methods 0.000 claims description 29
- 238000012549 training Methods 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000012423 maintenance Methods 0.000 claims description 4
- 239000012141 concentrate Substances 0.000 claims description 3
- 210000004218 nerve net Anatomy 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 66
- 238000013528 artificial neural network Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 15
- 238000009826 distribution Methods 0.000 description 13
- 239000011159 matrix material Substances 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 210000004205 output neuron Anatomy 0.000 description 7
- 210000002364 input neuron Anatomy 0.000 description 6
- 230000005611 electricity Effects 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 210000004027 cell Anatomy 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000001537 neural effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011773 genetically engineered mouse model Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3818—Decoding for concurrent execution
- G06F9/3822—Parallel decoding, e.g. parallel decode units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the present application discloses a kind of dispatching method and relevant apparatus, and wherein method is based on the server comprising multiple computing devices, comprising: receives operation request;Obtain the instruction stream that corresponding target nerve network model is requested in the operation;Described instruction stream is split as multiple parallel instructions and multiple serial commands;And the corresponding multiple parallel computation units of the multiple parallel instruction and at least one serial-computing machine corresponding with the multiple serial command are chosen from the multiple computing device;Corresponding operational data is requested to carry out that final operation result is calculated the operation according to the corresponding serial command of serial-computing machine each in the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and at least one described serial-computing machine;The final operation result is sent to the electronic equipment for sending the operation request.The operational efficiency of the single operation request of server process can be improved in the embodiment of the present application.
Description
Technical field
This application involves field of computer technology, and in particular to a kind of dispatching method and relevant apparatus.
Background technique
Neural network is the basis of current many artificial intelligence applications, with the further expansion of the application range of neural network
Greatly, various neural network models are stored using server or cloud computing service, and for the fortune that user submits
It calculates request and carries out operation, how to improve the operation efficiency of server is those skilled in the art's technical problem to be solved.
Summary of the invention
The embodiment of the present application proposes a kind of dispatching method and relevant apparatus, and the computing device that can be chosen in server executes
Single operation request, improves the operational efficiency of server.
In a first aspect, the embodiment of the present application provides a kind of dispatching method, it is described based on the server of multiple computing devices
Method includes:
Receive operation request;
Obtain the instruction stream that corresponding target nerve network model is requested in the operation;
Described instruction stream is split as multiple parallel instructions and multiple serial commands;
Chosen from the multiple computing device corresponding with the multiple parallel instruction multiple parallel computation units and with
At least one corresponding serial-computing machine of the multiple serial command;
According to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and described at least one
In a serial-computing machine the corresponding serial command of each serial-computing machine to the operation request corresponding operational data into
Row calculates, and obtains final operation result;
The final operation result is sent to the electronic equipment for sending the operation request.
Second aspect, the embodiment of the present application provide a kind of server, and the server includes multiple computing devices,
In:
Receiving unit, for receiving operation request;
Acquiring unit requests the instruction stream of corresponding target nerve network model for obtaining the operation;
Split cells, for described instruction stream to be split as multiple parallel instructions and multiple serial commands;
Selection unit, it is corresponding with the multiple parallel instruction multiple parallel for being chosen from the multiple computing device
Computing device and at least one serial-computing machine corresponding with the multiple serial command;
Arithmetic element, for according to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit
Serial command corresponding with serial-computing machine each at least one described serial-computing machine is corresponding to operation request
Operational data calculated, obtain final operation result;
Transmission unit, for the final operation result to be sent to the electronic equipment for sending the operation request.
The third aspect, the embodiment of the present application provide another server, including processor, memory, communication interface with
And one or more programs, wherein one or more of programs are stored in the memory, and are configured by described
Processor executes, and described program includes the instruction for the step some or all of as described in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer storage medium
It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor
State the method that processor executes above-mentioned first aspect.
After above-mentioned dispatching method and relevant apparatus, when the single fortune for receiving electronic equipment transmission in server
It when calculating request, requests the instruction stream of corresponding target nerve network model to split operation, and chooses and parallel instruction pair
It is independent to choose the serial-computing machine for being good at handling serial command for the corresponding parallel instruction of parallel computation unit parallel processing answered
Corresponding serial command is executed, and requests corresponding final operation result to be sent to electronic equipment the operation.That is, logical
It crosses the computing device that multiple parallel computation units are concentrated and executes corresponding parallel instruction parallel, when saving the execution of parallel instruction
Between, the computational efficiency between each serial command is improved by serial-computing machine.Unified distribution meter is requested according to operation
Resource is calculated, so that multiple computing devices in server are effectively cooperated, to improve the integral operation efficiency of server.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Wherein:
Fig. 1 is a kind of structural schematic diagram of server provided by the embodiments of the present application;
Fig. 1 a is a kind of structural schematic diagram of computing unit provided by the embodiments of the present application;
Fig. 1 b is a kind of structural schematic diagram of main process task circuit provided by the embodiments of the present application;
Fig. 1 c is a kind of data distribution schematic diagram of computing unit provided by the embodiments of the present application;
Fig. 1 d is a kind of data back schematic diagram of computing unit provided by the embodiments of the present application;
Fig. 1 e is a kind of operation schematic diagram of neural network structure provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of dispatching method provided by the embodiments of the present application;
Fig. 3 is the structural schematic diagram of another server provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of another server provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded
Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment
And be not intended to limit the application.As present specification and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in present specification and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
As used in this specification and in the appended claims, term " if " can be according to context quilt
Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or
" if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true
It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
The embodiment of the present application proposes a kind of dispatching method and relevant apparatus, and the computing device that can be chosen in server executes
Operation request, improves the operational efficiency of server.It is further to the application below in conjunction with specific embodiment, and referring to attached drawing
It is described in detail.
Fig. 1 is please referred to, Fig. 1 is a kind of structural schematic diagram of server provided by the embodiments of the present application.On as shown in Figure 1,
Stating server includes multiple computing devices, and computing device includes but is not limited to server computer, can also be personal computer
(personal computer, PC), network PC, minicomputer, mainframe computer etc..
In this application, it establishes connection by wired or wireless between each computing device for including in server and transmits
Data, and each computing device includes at least one calculating carrier, such as: central processing unit (Central Processing
Unit, CPU), image processor (graphics processing unit, GPU), processor board etc..And involved by the application
And server can also be Cloud Server, provide cloud computing service for electronic equipment.
Wherein, each carrier that calculates may include the computing unit that at least one is used for neural network computing, such as: processing core
Piece etc..The specific structure of computing unit is not construed as limiting, Fig. 1 a is please referred to, Fig. 1 a is a kind of structural representation of computing unit
Figure.As shown in Figure 1a, which includes: main process task circuit, basic handling circuit and branch process circuit.Specifically, main
Processing circuit and branch process circuit connection, at least one basic handling circuit of branch process circuit connection.
The branch process circuit, for receiving and dispatching the data of main process task circuit or basic handling circuit.
B refering to fig. 1, Fig. 1 b are a kind of structural schematic diagram of main processing circuit, and as shown in Figure 1 b, main process task circuit can wrap
Register and/or on piece buffer circuit are included, which can also include: control circuit, vector operation device circuit, ALU
(arithmetic and logic unit, arithmetic logic circuit) circuit, accumulator circuit, DMA (Direct Memory
Access, direct memory access) circuits such as circuit, certainly in practical applications, above-mentioned main process task circuit can also add, conversion
Circuit (such as matrix transposition circuit), data rearrangement circuit or active circuit etc. others circuit.
Main process task circuit further includes data transmitting line, data receiver circuit or interface, which can collect
At data distribution circuit and data broadcasting circuit, certainly in practical applications, data distribution circuit and data broadcasting circuit
It can also be respectively set;Above-mentioned data transmitting line and data receiver circuit also can integrate shape together in practical applications
At data transmit-receive circuit.For broadcast data, that is, need to be sent to the data of each based process circuit.For distributing data,
Need selectively to be sent to the data of part basis processing circuit, specific selection mode can be by main process task circuit foundation
Load and calculation are specifically determined.For broadcast transmission mode, i.e., broadcast data is sent to the forms of broadcasting
Each based process circuit.(broadcast data in practical applications, is sent to each based process by way of once broadcasting
Broadcast data can also be sent to each based process circuit, the application specific implementation by way of repeatedly broadcasting by circuit
Mode is not intended to limit the number of above-mentioned broadcast), for distributing sending method, i.e., distribution data are selectively sent to part base
Plinth processing circuit.
Realizing that the control circuit of main process task circuit is to some or all of based process circuit transmission number when distributing data
According to (data may be the same or different, specifically, if sending data by the way of distribution, each reception data
The data that based process circuit receives can be different, naturally it is also possible to which the data for having part basis processing circuit to receive are identical;
Specifically, when broadcast data, the control circuit of main process task circuit is to some or all of based process circuit transmission
Data, each based process circuit for receiving data can receive identical data, i.e. broadcast data may include all bases
Processing circuit is required to the data received.Distribution data may include: the data that part basis processing circuit needs to receive.
The broadcast data can be sent to all branch process circuits, branch process electricity by one or many broadcast by main process task circuit
The road broadcast data is transmitted to all based process circuits.
Optionally, the vector operation device circuit of above-mentioned main process task circuit can execute vector operation, including but not limited to: two
A vector addition subtraction multiplication and division, vector and constant add, subtract, multiplication and division operation, or executes any operation to each element in vector.
Wherein, continuous operation is specifically as follows, and vector and constant add, subtract, multiplication and division operation, activating operation, accumulating operation etc..
Each based process circuit may include base register and/or basic on piece buffer circuit;Each based process
Circuit can also include: one or any combination in inner product operation device circuit, vector operation device circuit, accumulator circuit etc..On
Stating inner product operation device circuit, vector operation device circuit, accumulator circuit can be integrated circuit, above-mentioned inner product operation device electricity
Road, vector operation device circuit, accumulator circuit may be the circuit being separately provided.
The connection structure of branch process circuit and tandem circuit can be arbitrary, and be not limited to the H-type structure of Fig. 1 b.It can
Choosing, main process task circuit to tandem circuit is the structure of broadcast or distribution, and tandem circuit to main process task circuit is to collect
(gather) structure.Broadcast, distribution and collection are defined as follows:
The data transfer mode of the main process task circuit to tandem circuit may include:
Main process task circuit is respectively connected with multiple branch process circuits, each branch process circuit again with multiple tandem circuits
It is respectively connected with.
Main process task circuit is connected with a branch process circuit, which reconnects a branch process electricity
Road, and so on, multiple branch process circuits of connecting, then, each branch process circuit distinguish phase with multiple tandem circuits again
Even.
Main process task circuit is respectively connected with multiple branch process circuits, and each branch process circuit is connected multiple basic electric again
Road.
Main process task circuit is connected with a branch process circuit, which reconnects a branch process electricity
Road, and so on, multiple branch process circuits of connecting, then, each branch process circuit is connected multiple tandem circuits again.
When distributing data, main process task circuit transmits data, each base for receiving data to some or all of tandem circuit
The data that plinth circuit receives can be different;
When broadcast data, main process task circuit transmits data, each base for receiving data to some or all of tandem circuit
Plinth circuit receives identical data.
When collecting data, part or all of tandem circuit is to main process task circuit transmission data.It should be noted that such as Fig. 1 a institute
The computing unit shown can be an individual phy chip, and certainly in practical applications, which also can integrate
In other chips (such as CPU, GPU), the application specific embodiment is not intended to limit the physical behavior shape of said chip device
Formula.
C refering to fig. 1, Fig. 1 c are a kind of data distribution schematic diagram of computing unit, and as shown in the arrow of Fig. 1 c, which is
The distribution direction of data, as illustrated in figure 1 c, after main process task circuit receives external data, after external data is split, point
Multiple branch process circuits are sent to, branch process circuit is sent to basic handling circuit for data are split.
D refering to fig. 1, Fig. 1 d are a kind of data back schematic diagram of computing unit, and as shown in the arrow of Fig. 1 d, which is
Data (such as inner product calculated result) is returned to branch process by the upstream direction of data, as shown in Figure 1 d, basic handling circuit
Circuit, branch process circuit are being back to main process task circuit.
It can be specifically vector, matrix, multidimensional (three-dimensional or four-dimensional or more) data, for defeated for input data
A specific value for entering data, is properly termed as an element of the input data.
Present disclosure embodiment also provides a kind of calculation method of computing unit as shown in Figure 1a, the calculation method apply with
In neural computing, specifically, the computing unit can be used for the input data to one or more layers in multilayer neural network
Operation is executed with weight data.
Specifically, computing unit described above is used for the input data to one or more layers in trained multilayer neural network
Operation is executed with weight data;
Or the computing unit is used for the input data and power to one or more layers in the multilayer neural network of forward operation
Value Data executes operation.
Above-mentioned operation includes but is not limited to: convolution algorithm, Matrix Multiplication matrix operation, Matrix Multiplication vector operation, biasing operation,
One of full connection operation, GEMM operation, GEMV operation, activation operation or any combination.
GEMM calculating refers to: the operation of the matrix-matrix multiplication in the library BLAS.The usual representation of the operation are as follows: C=
Alpha*op (S) * op (P)+beta*C, wherein S and P is two matrixes of input, and C is output matrix, and alpha and beta are
Scalar, op represents certain operation to matrix S or P, in addition, also having the integer of some auxiliary as a parameter to illustrating matrix
The width of S and P is high;
GEMV calculating refers to: the operation of the Matrix-Vector multiplication in the library BLAS.The usual representation of the operation are as follows: C=
Alpha*op (S) * P+beta*C, wherein S is input matrix, and P is the vector of input, and C is output vector, and alpha and beta are
Scalar, op represent certain operation to matrix S.
The application is not construed as limiting for calculating the connection relationship between carrier in computing device, can be isomorphism or isomery
Carrier is calculated, is also not construed as limiting for calculating the connection relationship in carrier between computing unit, is carried by the calculating of above-mentioned isomery
Body or computing unit execute parallel task, and operation efficiency can be improved.
Computing device as described in Figure 1 includes that at least one calculates carrier, wherein calculate carrier includes at least one meter again
Unit is calculated, i.e., selected target computing device depends on the connection relationship between each computing device and each meter in the application
Calculate the attribute letter that the specific physical hardware such as neural network model, Internet resources disposed in device supports situation and operation request
The calculating carrier of same type, then can be deployed in the same computing device, such as the calculating carrier portion that will be used for propagated forward by breath
It is deployed on the same computing device, without being different computing device, effectively reduces the expense communicated between computing device, just
In raising operation efficiency;Specific neural network model can also be deployed in specific calculating carrier, i.e. server is receiving needle
When requesting the operation of specified neural network, the corresponding calculating carrier of above-mentioned specified neural network is called to execute above-mentioned operation request
, the time of determining processing task is saved, operation efficiency is improved.
In this application, will be disclosed, and the neural network model used extensively is as specified neural network model (example
Such as: LeNet, AlexNet, ZFnet in convolutional neural networks (convolutional neural network, CNN),
GoogleNet、VGG、ResNet)。
Optionally, it obtains specified neural network model and concentrates the operation demand of each specified neural network model and described more
The hardware attributes of each computing device obtain multiple operation demands and multiple hardware attributes in a computing device;According to the multiple
Operation demand and the multiple hardware attributes are corresponding by the specified each specified neural network model of neural network model concentration
Specified computing device on dispose corresponding specified neural network model.
Wherein, specifying neural network model collection includes multiple specified neural network models, the hardware attributes packet of computing device
Network bandwidth, memory capacity, the processor host frequency rate etc. for including computing device itself further include that carrier or meter are calculated in computing device
Calculate the hardware attributes of unit.That is, according to the selection of the hardware attributes of each computing device and specified neural network model
The corresponding computing device of operation demand, can avoid processing leads to server failure not in time, and energy is supported in the operation for improving server
Power.
The input neuron and output neuron mentioned in the application do not mean that refreshing in the input layer of entire neural network
Through neuron in member and output layer, but the mind for two layers of arbitrary neighborhood in network, in network feed forward operation lower layer
It is input neuron through member, the neuron in network feed forward operation upper layer is output neuron.With convolutional Neural net
For network, if a convolutional neural networks have L layers, K=1,2 ..., L-1, for K layers and K+1 layers, by K layers
Referred to as input layer, neuron therein are the input neuron, and K+1 layers are known as output layer, and neuron therein is described
Output neuron.I.e. in addition to top, each layer all can serve as input layer, and next layer is corresponding output layer.
The operation being mentioned above all is one layer in neural network of operation, for multilayer neural network, realizes process
As shown in fig. le, the arrow of dotted line indicates reversed operation in figure, and the arrow of solid line indicates forward operation.In forward operation, when
Upper one layer of artificial neural network executes complete after, using upper one layer of obtained output neuron as next layer of input neuron
It carries out operation (or the input neuron that certain operations are re-used as next layer is carried out to the output neuron), meanwhile, it will weigh
Value also replaces with next layer of weight.In reversed operation, after the completion of the reversed operation of upper one layer of artificial neural network executes,
The input neuron gradient that upper one layer obtains is subjected to operation (or to the input as next layer of output neuron gradient
Neuron gradient carries out the output neuron gradient that certain operations are re-used as next layer), while weight is replaced with next layer
Weight.
The forward operation of neural network is that input data is input to the calculating processes of final output data, reversed operation with just
It is to the direction of propagation of operation on the contrary, anti-for final output data and the loss of desired output data or the corresponding loss function of loss
To the calculating process by forward operation.By information forward operation and reversed operation in cycles, according to loss or loss
Functional gradient decline mode correct each layer weight, each layer weight is adjusted and neural network learning training process,
The loss of network output can be reduced.
Fig. 2 is referred to, Fig. 2 is a kind of flow diagram of dispatching method provided by the embodiments of the present application, as shown in Fig. 2,
This method is applied to server as shown in Figure 1, and this method is related to the above-mentioned electronic equipment for allowing to access above-mentioned server, should
Electronic equipment may include the various handheld devices with wireless communication function, mobile unit, wearable device, calculate equipment or
Other processing equipments and various forms of user equipmenies (user equipment, UE) of radio modem are connected to,
Mobile station (mobile station, MS), terminal device (terminal device) etc..
201: receiving operation request.
In this application, server receives the operation request that the electronic equipment for allowing to access is sent.
Operation request includes target nerve network mould involved in processor active task (training mission or test assignment), operation
The attribute informations such as type.Wherein, training mission is for being trained target nerve network model, i.e., to the neural network model into
Row forward operation and reversed operation, until training is completed;And test assignment is used to be carried out once according to target nerve network model
Forward operation.
Above-mentioned target nerve network model can be user and send the nerve net uploaded when operation request by electronic equipment
Network model is also possible to the neural network model stored in server etc., number of the application for target nerve network model
Amount is also not construed as limiting, i.e., each operation request can correspond at least one target nerve network model.
202: obtaining the instruction stream that corresponding target nerve network model is requested in the operation.
Wherein, instruction stream specifies the order of operation and the corresponding instruction of each sequence of target nerve network model, that is, refers to
Sequence is enabled, the operation of target nerve network model can be achieved by above-metioned instruction stream.Each target nerve network model corresponding one
A basic operation sequence obtains the data knot of description target nerve network model operation by parsing target nerve network model
Structure.The application is not construed as limiting the resolution rules between basic operation sequence and instruction description symbol, according to basic operation sequence
Resolution rules between instruction description symbol obtain the corresponding instruction description symbol stream of target nerve network model.
The application is also not construed as limiting the preset format of each instruction description symbol stream in instruction description symbol stream.According to default
The network structure of format produces instruction description symbol and flows corresponding instruction.Above-metioned instruction includes the institute in cambricon instruction set
There are instruction, such as the instruction of matrix operation command, convolution algorithm, the instruction of full articulamentum forward operation, pond operational order, normalization
Instruction, vector operation instruction and scalar operation instruction.
Optionally, it includes: according to institute that the instruction stream of corresponding target nerve network model is requested in the acquisition operation
State corresponding the first instruction description of the basic operation retrieval symbol stream of target nerve network model;First instruction description is accorded with
Stream is simplified to obtain the second instruction description symbol stream;Stream, which is accorded with, according to second instruction description obtains described instruction stream.
That is, eliminating finger extra in the first instruction description symbol stream by the simplification to the first instruction description symbol stream
Descriptor is enabled, so as to shorten instruction stream.Stream, which is accorded with, further according to the second instruction description obtains the instruction stream that can be executed by a computing apparatus,
Output data is obtained according to instruction and input data operation, is overcome with the fine-grained atomic operation group such as convolution, Chi Hua, activation
At intact nervous network carry out operation when generation redundancy input, output or other operation, to further improve clothes
The arithmetic speed of business device.
It should be noted that if the corresponding multiple target nerve network models of operation request, then need to obtain multiple target minds
Instruction stream through network model, then split, to complete operation request.
203: described instruction stream is split as multiple parallel instructions and multiple serial commands.
The application is not construed as limiting for how to split instruction stream, parallel instruction be can distribute to multiple computing devices simultaneously into
The instruction of row operation, and serial command is that the instruction of operation can only be completed by single computing device.Such as: video identification, understanding
Equal operations request generally comprises feature extraction instruction and feature identification instruction, and wherein feature extraction instruction is needed to continuous several frames
Image carries out process of convolution, and feature identification instruction, which is generally required, carries out Recognition with Recurrent Neural Network to the feature that feature extraction instructs
Calculating.Above-mentioned feature extraction instruction can distribute to multiple computing devices, and feature identification instruction can only be by individually calculating dress
It sets and is handled.
204: multiple parallel computation units corresponding with the multiple parallel instruction are chosen from the multiple computing device
At least one serial-computing machine corresponding with the multiple serial command.
Instruction stream is split to the multiple meters for obtaining multiple parallel instructions and multiple serial commands, including from server
It calculates to choose in device and executes the corresponding parallel computation unit of each parallel instruction serial meter corresponding with multiple serial commands are executed
Device is calculated, to obtain multiple parallel computation units and at least one serial-computing machine, and the operation of each parallel computation unit
Instruction is the corresponding parallel instruction of the parallel computation unit, and the operational order of each serial-computing machine is corresponding serial finger
It enables.
The application is not construed as limiting the choosing method of serial-computing machine and parallel computation unit.
Optionally, if the processor active task of operation request is training mission, it is based on the corresponding instruction of the training mission
Practice method and requests corresponding operational data to be grouped to obtain multiple groups operational data the operation;According to the multiple groups operand
The multiple parallel computation unit is chosen from the multiple computing device according to the multiple parallel instruction.
Corresponding operational data is requested to be grouped to obtain multiple groups operational data operation according to specific training method, it can
To be grouped with the data type of operational data, operational data can also be divided into multiple groups, it is not limited here.After grouping
The suitable computing device of reselection carries out concurrent operation, that is, the operand of each computing device is further reduced, to improve
Operation efficiency.
Such as: for batch gradient descent algorithm (Batch Gradient Descent, BGD), it is believed that there are an instructions
Practice collection (batch), a batch can be divided for multiple subsets, and distribute to multiple computing devices, wherein each computing device
A subset is trained, each subset is one group of operational data;For stochastic gradient descent algorithm (Stochastic
Gradient Descent, SGD), it is believed that only one operational data in each training set (batch), it can be by different instructions
Practice collection and distributes to different computing devices;It, can will be each for small lot gradient descent algorithm (mini-batch SGD)
The different data of batch distributes to different computing devices to calculate, and each batch can also be divided for smaller subset, then
Different computing devices is distributed to be calculated.
In this application, serial-computing machine can be a computing device or multiple meters in multiple parallel computation units
Device is calculated, other idle computing devices etc. are also possible to.
The biggish feelings of estimated performance gap for the corresponding part of serial command, and in serial command between each section
Condition, it is optionally, described that at least one described string corresponding with the multiple serial command is chosen from the multiple computing device
Row computing device, comprising: the multiple serial command is grouped to obtain at least one set of serial instruction sequence;From the multiple
Computing device corresponding with every group of serial instruction sequence at least one set of serial instruction sequence is chosen in computing device to obtain
At least one described serial-computing machine.
Multiple serial commands are grouped to obtain multiple groups instruction sequence, reselection calculating corresponding with every group of instruction sequence
Device carries out operation, and the computing device by being good at processing executes corresponding instruction, improves the operation efficiency of each section, thus
Improve integral operation efficiency.
With fast area convolutional neural networks (faster region-based convolution neural
Network, Faster R-CNN) for, Faster R-CNN is by convolutional layer, candidate region neural network (region
Proposal network, RPN) layer and interest pool area (region of interest pooling, ROI pooling)
Layer composition, and the estimated performance between each layer has a long way to go, then can be deployed in convolutional layer and RPN and be good at handling convolution
On neural computing device, and ROI pooling can be then deployed on the neural computing device for being good at handling convolution,
Such as more generally applicable processor CPU, then the operation efficiency of each section is improved, to improve integral operation efficiency.
Optionally, if the processor active task of operation request is test assignment, packet is chosen from the multiple computing device
The computing device for including the forward direction operation of the target nerve network model obtains multiple target computing devices;If the processor active task
For training mission, forward direction operation and backward instruction including the target nerve network model are chosen from the multiple computing device
Experienced computing device obtains the multiple target computing device;From the multiple target computing device choose with it is the multiple simultaneously
Row instructs corresponding multiple parallel computation units and at least one serial-computing machine corresponding with the multiple serial command.
That is, multiple target computing devices are that can be used for executing if the processor active task of operation request is test assignment
The computing device of the forward direction operation of target nerve network model;And when processor active task is training mission, multiple targets calculate dress
It is set to the forward direction operation that can be used for performance objective neural network model and computing device trained backward, i.e., is filled by dedicated computing
Setting processing operation request can be improved the accuracy rate and operation efficiency of operation.
It for example, include the first computing device and the second computing device in server, wherein the first computing device only wraps
Containing for specifying the forward direction operation of neural network model, the second computing device can both execute above-mentioned specified neural network model
Forward direction operation, and the backward trained operation of above-mentioned specified neural network model can be executed.When the target operation request received
In target nerve network model be above-mentioned specified neural network model, and processor active task be test assignment when, determine the first meter
It calculates device and executes above-mentioned target operation request.
Optionally, auxiliary dispatching algorithm is chosen from auxiliary dispatching set of algorithms according to the attribute information that the operation is requested;
The multiple parallel computation unit and described at least one are chosen from the multiple computing device according to the auxiliary dispatching algorithm
A serial-computing machine.
Wherein, auxiliary dispatching set of algorithms includes but is not limited to the next item down: polling dispatching (Round-Robin
Scheduling) algorithm, weighted polling (Weighted Round Robin) algorithm, minimum link (Least
Connections) algorithm, minimum link (the Weighted Least Connections) algorithm of weighting, based on locality most
Few link (Locality-Based Least Connections) algorithm, tape copy are at least linked based on locality
(Locality-Based Least Connections with Replication) algorithm, destination address hash
(Destination Hashing) algorithm, source address hash (Source Hashing) algorithm.
The application is not construed as limiting for how to choose auxiliary dispatching algorithm according to attribute information, for example, if multiple mesh
It marks computing device and handles operation request of the same race, then auxiliary dispatching algorithm can be polling dispatching algorithm;If different targets calculates
The anti-pressure ability of device is different, it should which the target computing device high to configuration, load is low distributes more operations and requests, then assists
Dispatching algorithm can be Weighted Round Robin;If the workload that each target computing device is assigned in multiple target computing devices
It is not quite similar, then auxiliary dispatching algorithm can be minimum chained scheduling algorithm, dynamically chooses and wherein currently overstocks connection number most
A few target computing device handles current request, improves the utilization efficiency of target computing device as much as possible, can also
To be the minimum chained scheduling algorithm of weighting.
That is, on the basis of the dispatching method as involved in above-described embodiment, in conjunction with auxiliary dispatching algorithm picks
The final computing device for executing operation request, to further increase the operation efficiency of server.
205: according to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and it is described extremely
The corresponding serial command of each serial-computing machine requests corresponding operand to the operation in a few serial-computing machine
According to carrying out that final operation result is calculated.
The application requests corresponding operational data to be not construed as limiting each operation, can be the image for image recognition
Data are also possible to the voice data etc. for speech recognition;When processor active task is test assignment, operational data is on user
The data of biography, and when processor active task is training mission, operational data can be the training set of user's upload, be also possible to service
The training set corresponding with target nerve network model stored in device.
It can produce multiple intermediate calculation results in the calculating process of operational order, can be obtained according to multiple intermediate calculation results
Corresponding final operation result is requested in operation.
206: the final operation result is sent to the electronic equipment for sending the operation request.
It is appreciated that operation is requested corresponding when receiving the single operation request that electronic equipment is sent in server
The instruction stream of target nerve network model is split, and chooses parallel computation unit parallel processing pair corresponding with parallel instruction
The parallel instruction answered chooses the serial-computing machine for being good at handling serial command and corresponding serial command is individually performed, and should
Operation requests corresponding final operation result to be sent to electronic equipment.That is, concentrated by multiple parallel computation units
Computing device executes corresponding parallel instruction parallel, saves the execution time of parallel instruction, is improved by serial-computing machine
Computational efficiency between each serial command.Unified distribution computing resource is requested according to operation, so that more in server
A computing device is effectively cooperated, to improve the integral operation efficiency of server.
Optionally, the method also includes: wait the first preset duration, detect the multiple parallel computation unit and described
Whether each computing device obtains corresponding final operation result at least one serial-computing machine, if it is not, described will not obtain
To final operation result computing device as Delay computing device;It instructs according to the Delay computing device is corresponding from described
Spare computing device is chosen in the idle computing device of multiple computing devices.
That is, the computing device of final operation result will do not obtained as delay when the first preset duration reaches
Computing device chooses spare computing device according to the instruction that Delay computing device executes, to improve from idle computing device
Operation efficiency.
Optionally, it is described by the spare computing device execute the corresponding operational order of the Delay computing device it
Afterwards, the method also includes: obtain the final fortune obtained at first between the Delay computing device and the spare computing device
Calculate result;It is sent out to the computing device for not obtaining final operation result between the Delay computing device and the spare computing device
Send pause instruction.
Wherein, pause instruction, which is used to indicate between Delay computing device and spare computing device, does not obtain final operation result
Computing device pause execute corresponding operational order.That is, executing Delay computing device pair by spare computing device
The instruction answered, and choose and obtain final operation result between spare computing device and Delay computing device at first as operational order
Corresponding final operation result, and to by not obtaining final operation result between Delay computing device and spare computing device
Computing device sends pause instruction, that is, suspends the operation for not completing the computing device of operational order, to save power consumption.
Optionally, the method also includes: wait the second preset duration, detect the Delay computing device and whether obtain pair
The final operation result answered, if it is not, using the Delay computing device for not obtaining final operation result as calculation of fault device,
Send faulting instruction.
Wherein, for faulting instruction for informing that operation maintenance personnel calculation of fault device breaks down, the second preset duration is greater than institute
State the first preset duration.That is, the second preset duration reach when, if do not receive Delay computing device obtain it is final
Operation result then judges that executing Delay computing device breaks down, and informs corresponding operation maintenance personnel, to improve the place of failure
Reason ability.
Optionally, the method also includes: every object time threshold value, update the hash table of the multiple computing device.
Wherein, hash table (Hash table, be also Hash table), be according to key value (Key value) and directly into
The data structure of row access.In this application, using the IP address of multiple computing devices as key value, pass through hash function
The position that (mapping function) maps that in hash table can quickly be found that is, after determining target computing device
The physical resource that target computing device is distributed.The concrete form of hash table is not construed as limiting, can be artificially be arranged it is quiet
The hash table of state is also possible to the hardware resource distributed according to IP address.Every object time threshold value, to multiple computing devices
Hash table is updated, and improves the accuracy and search efficiency of lookup.
Consistent with the embodiment of above-mentioned Fig. 2, referring to figure 3., Fig. 3 is the structure of another server provided herein
Schematic diagram, above-mentioned server include multiple computing devices.As shown in figure 3, above-mentioned server 300 includes:
Receiving unit 301, for receiving operation request;
Acquiring unit 302 requests the instruction stream of corresponding target nerve network model for obtaining the operation;
Split cells 303, for described instruction stream to be split as multiple parallel instructions and multiple serial commands;
Selection unit 304, it is corresponding with the multiple parallel instruction multiple for being chosen from the multiple computing device
Parallel computation unit and at least one serial-computing machine corresponding with the multiple serial command;
Arithmetic element 305, for corresponding parallel according to parallel computation unit each in the multiple parallel computation unit
Serial command corresponding with serial-computing machine each at least one described serial-computing machine is instructed to request the operation
Corresponding operational data is calculated, and final operation result is obtained;
Transmission unit 306, for the final operation result to be sent to the electronic equipment for sending the operation request.
Optionally, the acquiring unit 302 is specifically used for according to the corresponding basic operation of the target nerve network model
The first instruction description of retrieval symbol stream;First instruction description symbol stream is simplified, the second instruction description symbol stream is obtained;
Stream, which is accorded with, according to second instruction description obtains described instruction stream.
Optionally, the split cells 303 is specifically used for being grouped the multiple serial command, obtains at least one set
Serial instruction sequence;The selection unit 304 is specifically used for choosing from the multiple computing device and at least one set of string
The corresponding computing device of every group of serial instruction sequence in row instruction sequence obtains at least one described serial-computing machine.
Optionally, it if the processor active task that the split cells 303 is specifically used for operation request is training mission, is based on
The operation requests corresponding training method to request corresponding operational data to be grouped the operation, obtains multiple groups operand
According to;The selection unit 304 is specifically used for according to the multiple groups operational data and the multiple parallel instruction from the multiple meter
It calculates in device and chooses the multiple parallel computation unit.
Optionally, if the processor active task that the selection unit 304 is specifically used for operation request is test assignment, from institute
The computing device for choosing the forward direction operation including the target nerve network model in multiple computing devices is stated, multiple targets are obtained
Computing device;If the processor active task is training mission, choosing from the multiple computing device includes the target nerve net
The forward direction operation of network model and computing device trained backward, obtain the multiple target computing device;From the multiple target
The multiple parallel computation unit and at least one described serial-computing machine are chosen in computing device.
Optionally, the selection unit 304 is specifically used for the attribute information requested according to the operation from auxiliary dispatching calculation
Method, which is concentrated, chooses auxiliary dispatching algorithm, and the auxiliary dispatching set of algorithms includes at least one of the following: polling dispatching algorithm, weighting wheel
Ask algorithm, at least link algorithm, the minimum link algorithm of weighting, Locality-Based Least Connections Scheduling algorithm, tape copy based on office
Portion's property at least links algorithm, destination address hashing algorithm, source address hashing algorithm;According to the auxiliary dispatching algorithm from described more
The multiple parallel computation unit and at least one described serial-computing machine are chosen in a computing device.
Optionally, the server further includes detection unit 307, for waiting the first preset duration, is detected the multiple
Whether parallel computation unit and at least one described serial-computing machine obtain corresponding final operation result, if it is not, will be described
The computing device of final operation result is not obtained as Delay computing device;By the selection unit 304 according to the delay meter
It calculates the corresponding instruction of device and chooses spare computing device from the idle computing device of the multiple computing device;By the operation
Unit 305 executes the corresponding instruction of the Delay computing device by the spare computing device.
Optionally, the acquiring unit 302, be also used to obtain the Delay computing device and the spare computing device it
Between the final operation result that obtains at first;It is filled from the transmission unit 306 to the Delay computing device and the spare calculating
The computing device for not obtaining final operation result between setting sends pause instruction.
Optionally, the detection unit 307 is also used to wait the second preset duration, and detecting the Delay computing device is
It is no to obtain corresponding final operation result, if it is not, using the Delay computing device for not obtaining final operation result as failure
Computing device;Faulting instruction is sent by the transmission unit 304, the faulting instruction is by informing based on failure described in operation maintenance personnel
It calculates device to break down, second preset duration is greater than first preset duration.
Optionally, the server further includes updating unit 308, for updating the service every object time threshold value
The hash table of device.
Optionally, the acquiring unit 302 is also used to obtain specified neural network model and concentrates each specified neural network
The hardware attributes of each computing device obtain multiple operation demands and more in the operation demand of model and the multiple computing device
A hardware attributes;
The server further includes deployment unit 309, for according to the multiple operation demand and the multiple hardware category
Property the specified neural network model concentrated to dispose on the corresponding specified computing device of each specified neural network model correspond to
Specified neural network model.
Optionally, the computing device includes that at least one calculates carrier, and the calculating carrier includes at least one calculating
Unit.
It is appreciated that operation is requested corresponding when receiving the single operation request that electronic equipment is sent in server
The instruction stream of target nerve network model is split, and chooses parallel computation unit parallel processing pair corresponding with parallel instruction
The parallel instruction answered chooses the serial-computing machine for being good at handling serial command and corresponding serial command is individually performed, and should
Operation requests corresponding final operation result to be sent to electronic equipment.That is, concentrated by multiple parallel computation units
Computing device executes corresponding parallel instruction parallel, saves the execution time of parallel instruction, is improved by serial-computing machine
Computational efficiency between each serial command.Unified distribution computing resource is requested according to operation, so that more in server
A computing device is effectively cooperated, to improve the integral operation efficiency of server.
In one embodiment, as shown in figure 4, including processor 401 this application discloses another server 400, depositing
Reservoir 402, communication interface 403 and one or more programs 404, wherein one or more programs 404 are stored in memory
In 402, and it is configured to be executed by processor, described program 404 includes for executing portion described in above-mentioned dispatching method
Point or Overall Steps instruction.
A kind of computer readable storage medium, above-mentioned computer-readable storage medium are provided in another embodiment of the invention
Matter is stored with computer program, and above-mentioned computer program includes program instruction, and above procedure instruction makes when being executed by a processor
Above-mentioned processor executes implementation described in dispatching method.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description
The specific work process at end and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed terminal and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of said units, only
Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.In addition, shown or discussed phase
Mutually between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication of device or unit
Connection is also possible to electricity, mechanical or other form connections.
Above-mentioned unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs
Purpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.
If above-mentioned integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment above method of the present invention
Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory,
ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. are various can store program
The medium of code.
It should be noted that in attached drawing or specification text, the implementation for not being painted or describing is affiliated technology
Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and method is simultaneously
It is not limited only to various specific structures, shape or the mode mentioned in embodiment, those of ordinary skill in the art can carry out letter to it
It singly changes or replaces.
Above specific embodiment has carried out further specifically the purpose of the application, technical scheme and beneficial effects
It is bright, it should be understood that the above is only the specific embodiments of the application, are not intended to limit this application, all the application's
Within spirit and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of protection of this application.
Claims (15)
1. a kind of dispatching method, which is characterized in that the method is based on the server comprising multiple computing devices, the method packet
It includes:
Receive operation request;
Obtain the instruction stream that corresponding target nerve network model is requested in the operation;
Described instruction stream is split as multiple parallel instructions and multiple serial commands;
Chosen from the multiple computing device corresponding with the multiple parallel instruction multiple parallel computation units and with it is described
At least one corresponding serial-computing machine of multiple serial commands;
According to the corresponding parallel instruction of parallel computation unit each in the multiple parallel computation unit and at least one described string
The corresponding serial command of each serial-computing machine requests corresponding operational data to be counted the operation in row computing device
It calculates, obtains final operation result;
The final operation result is sent to the electronic equipment for sending the operation request.
2. the method according to claim 1, wherein described obtain the corresponding target nerve net of the operation request
The instruction stream of network model, comprising:
According to corresponding the first instruction description of the basic operation retrieval symbol stream of the target nerve network model;
First instruction description symbol stream is simplified, the second instruction description symbol stream is obtained;
Stream, which is accorded with, according to second instruction description obtains described instruction stream.
3. according to the method described in claim 2, it is characterized in that, it is described from the multiple computing device choose with it is described more
At least one corresponding described serial-computing machine of a serial command, comprising:
The multiple serial command is grouped, at least one set of serial instruction sequence is obtained;
It is corresponding with every group of serial instruction sequence at least one set of serial instruction sequence from being chosen in the multiple computing device
Computing device, obtain at least one described serial-computing machine.
4. method according to claim 1-3, which is characterized in that it is characterized in that, described from the multiple meter
It calculates and chooses the multiple parallel computation unit corresponding with the multiple parallel instruction in device, comprising:
If the processor active task of the operation request is training mission, request corresponding training method to the fortune based on the operation
It calculates and corresponding operational data is requested to be grouped, obtain multiple groups operational data;
According to the multiple groups operational data and the multiple parallel instruction chosen from the multiple computing device it is the multiple simultaneously
Row computing device.
5. the method according to claim 1, wherein it is described from the multiple computing device choose with it is described more
A corresponding multiple parallel computation units of parallel instruction and at least one serial computing corresponding with the multiple serial command dress
It sets, comprising:
If the processor active task of the operation request is test assignment, choosing from the multiple computing device includes the target mind
The computing device of forward direction operation through network model, obtains multiple target computing devices;
If the processor active task is training mission, choosing from the multiple computing device includes the target nerve network model
Forward direction operation and computing device trained backward, obtain the multiple target computing device;
The multiple parallel computation unit and at least one described serial computing dress are chosen from the multiple target computing device
It sets.
6. the method according to claim 1, wherein it is described from the multiple computing device choose with it is described more
A corresponding multiple parallel computation units of parallel instruction and at least one serial computing corresponding with the multiple serial command dress
It sets, comprising:
Auxiliary dispatching algorithm, the auxiliary dispatching are chosen from auxiliary dispatching set of algorithms according to the attribute information that the operation is requested
Set of algorithms includes at least one of the following: polling dispatching algorithm, Weighted Round Robin, at least links algorithm, the minimum link calculation of weighting
Method, Locality-Based Least Connections Scheduling algorithm, tape copy at least linked based on locality algorithm, destination address hashing algorithm,
Source address hashing algorithm;
According to the auxiliary dispatching algorithm chosen from the multiple computing device the multiple parallel computation unit and it is described to
A few serial-computing machine.
7. method according to claim 1-6, which is characterized in that the method also includes:
The first preset duration is waited, is detected each in the multiple parallel computation unit and at least one described serial-computing machine
Whether computing device obtains corresponding final operation result, if it is not, the computing device for not obtaining final operation result is made
For Delay computing device;
It is chosen from the idle computing device of the multiple computing device according to the corresponding instruction of the Delay computing device spare
Computing device;
The corresponding instruction of the Delay computing device is executed by the spare computing device.
8. the method according to the description of claim 7 is characterized in that described by prolonging described in the spare computing device execution
After the corresponding instruction of slow computing device, the method also includes:
Obtain the final operation result obtained at first between the Delay computing device and the spare computing device;
It is sent to the computing device for not obtaining final operation result between the Delay computing device and the spare computing device
Pause instruction.
9. method according to claim 7 or 8, which is characterized in that the method also includes:
The second preset duration is waited, detects whether the Delay computing device obtains corresponding final operation result, if it is not, by institute
The Delay computing device for not obtaining final operation result is stated as calculation of fault device, sends faulting instruction, the faulting instruction
For informing that calculation of fault device described in operation maintenance personnel breaks down, when second preset duration is default greater than described first
It is long.
10. -9 described in any item methods according to claim 1, which is characterized in that the method also includes:
Every object time threshold value, the hash table of the server is updated.
11. -10 described in any item methods according to claim 1, which is characterized in that the method also includes:
Obtain the operation demand for specifying neural network model to concentrate each specified neural network model and the multiple computing device
In the hardware attributes of each computing device obtain multiple operation demands and multiple hardware attributes;
The specified neural network model is concentrated into each specify according to the multiple operation demand and the multiple hardware attributes
Corresponding specified neural network model is disposed on the corresponding specified computing device of neural network model.
12. -11 described in any item methods according to claim 1, which is characterized in that the computing device includes at least one meter
Carrier is calculated, the calculating carrier includes at least one computing unit.
13. a kind of server, which is characterized in that the server includes multiple computing devices, the server further include: be used for
Execute the unit such as the described in any item methods of claim 1-12.
14. a kind of server, which is characterized in that including processor, memory, communication interface and one or more program,
In, one or more of programs are stored in the memory, and are configured to be executed by the processor, described program
Include the steps that requiring the instruction in any one of 1-12 method for perform claim.
15. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, the calculating
Machine program includes program instruction, and described program instruction makes the processor execute such as claim 1-12 when being executed by a processor
Described in any item methods.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711467705.4A CN109976809B (en) | 2017-12-28 | 2017-12-28 | Scheduling method and related device |
PCT/CN2018/098324 WO2019128230A1 (en) | 2017-12-28 | 2018-08-02 | Scheduling method and related apparatus |
US16/767,415 US11568269B2 (en) | 2017-12-28 | 2018-08-02 | Scheduling method and related apparatus |
EP18895350.9A EP3731089B1 (en) | 2017-12-28 | 2018-08-02 | Scheduling method and related apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711467705.4A CN109976809B (en) | 2017-12-28 | 2017-12-28 | Scheduling method and related device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109976809A true CN109976809A (en) | 2019-07-05 |
CN109976809B CN109976809B (en) | 2020-08-25 |
Family
ID=67075497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711467705.4A Active CN109976809B (en) | 2017-12-28 | 2017-12-28 | Scheduling method and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109976809B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988229A (en) * | 2019-12-12 | 2021-06-18 | 上海大学 | Convolutional neural network resource optimization configuration method based on heterogeneous computation |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04216163A (en) * | 1990-02-20 | 1992-08-06 | Internatl Business Mach Corp <Ibm> | Data processing method and system, method and system for forming neural network data structure, method and system for training neural network, neural network moving method and defining method of neural network model |
US6192465B1 (en) * | 1998-09-21 | 2001-02-20 | Advanced Micro Devices, Inc. | Using multiple decoders and a reorder queue to decode instructions out of order |
CN101441557A (en) * | 2008-11-08 | 2009-05-27 | 腾讯科技(深圳)有限公司 | Distributed parallel calculating system and method based on dynamic data division |
CN102171650A (en) * | 2008-11-24 | 2011-08-31 | 英特尔公司 | Systems, methods, and apparatuses to decompose a sequential program into multiple threads, execute said threads, and reconstruct the sequential execution |
CN102360309A (en) * | 2011-09-29 | 2012-02-22 | 中国科学技术大学苏州研究院 | Scheduling system and scheduling execution method of multi-core heterogeneous system on chip |
CN103049330A (en) * | 2012-12-05 | 2013-04-17 | 大连理工大学 | Method and system for scheduling trusteeship distribution task |
US20130179485A1 (en) * | 2012-01-06 | 2013-07-11 | International Business Machines Corporation | Distributed parallel computation with acceleration devices |
US8812564B2 (en) * | 2011-12-20 | 2014-08-19 | Sap Ag | Parallel uniqueness checks for partitioned tables |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN106056529A (en) * | 2015-04-03 | 2016-10-26 | 阿里巴巴集团控股有限公司 | Method and equipment for training convolutional neural network used for image recognition |
US20170090991A1 (en) * | 2013-09-10 | 2017-03-30 | Sviral, Inc. | Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores |
CN106779057A (en) * | 2016-11-11 | 2017-05-31 | 北京旷视科技有限公司 | The method and device of the calculating binary neural network convolution based on GPU |
CN106909971A (en) * | 2017-02-10 | 2017-06-30 | 华南理工大学 | A kind of BP neural network parallel method towards multinuclear computing environment |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
CN107239826A (en) * | 2017-06-06 | 2017-10-10 | 上海兆芯集成电路有限公司 | Computational methods and device in convolutional neural networks |
CN107315571A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing full articulamentum neutral net forward operation |
CN107506173A (en) * | 2017-08-30 | 2017-12-22 | 郑州云海信息技术有限公司 | A kind of accelerated method, the apparatus and system of singular value decomposition computing |
-
2017
- 2017-12-28 CN CN201711467705.4A patent/CN109976809B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04216163A (en) * | 1990-02-20 | 1992-08-06 | Internatl Business Mach Corp <Ibm> | Data processing method and system, method and system for forming neural network data structure, method and system for training neural network, neural network moving method and defining method of neural network model |
US6192465B1 (en) * | 1998-09-21 | 2001-02-20 | Advanced Micro Devices, Inc. | Using multiple decoders and a reorder queue to decode instructions out of order |
CN101441557A (en) * | 2008-11-08 | 2009-05-27 | 腾讯科技(深圳)有限公司 | Distributed parallel calculating system and method based on dynamic data division |
CN102171650A (en) * | 2008-11-24 | 2011-08-31 | 英特尔公司 | Systems, methods, and apparatuses to decompose a sequential program into multiple threads, execute said threads, and reconstruct the sequential execution |
CN102360309A (en) * | 2011-09-29 | 2012-02-22 | 中国科学技术大学苏州研究院 | Scheduling system and scheduling execution method of multi-core heterogeneous system on chip |
US8812564B2 (en) * | 2011-12-20 | 2014-08-19 | Sap Ag | Parallel uniqueness checks for partitioned tables |
JP6083687B2 (en) * | 2012-01-06 | 2017-02-22 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Distributed calculation method, program, host computer, and distributed calculation system (distributed parallel calculation using accelerator device) |
US20130179485A1 (en) * | 2012-01-06 | 2013-07-11 | International Business Machines Corporation | Distributed parallel computation with acceleration devices |
CN103049330A (en) * | 2012-12-05 | 2013-04-17 | 大连理工大学 | Method and system for scheduling trusteeship distribution task |
US20170090991A1 (en) * | 2013-09-10 | 2017-03-30 | Sviral, Inc. | Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN106056529A (en) * | 2015-04-03 | 2016-10-26 | 阿里巴巴集团控股有限公司 | Method and equipment for training convolutional neural network used for image recognition |
CN107315571A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing full articulamentum neutral net forward operation |
CN106779057A (en) * | 2016-11-11 | 2017-05-31 | 北京旷视科技有限公司 | The method and device of the calculating binary neural network convolution based on GPU |
CN106909971A (en) * | 2017-02-10 | 2017-06-30 | 华南理工大学 | A kind of BP neural network parallel method towards multinuclear computing environment |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
CN107239826A (en) * | 2017-06-06 | 2017-10-10 | 上海兆芯集成电路有限公司 | Computational methods and device in convolutional neural networks |
CN107506173A (en) * | 2017-08-30 | 2017-12-22 | 郑州云海信息技术有限公司 | A kind of accelerated method, the apparatus and system of singular value decomposition computing |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988229A (en) * | 2019-12-12 | 2021-06-18 | 上海大学 | Convolutional neural network resource optimization configuration method based on heterogeneous computation |
Also Published As
Publication number | Publication date |
---|---|
CN109976809B (en) | 2020-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107196869B (en) | The adaptive load balancing method, apparatus and system of Intrusion Detection based on host actual loading | |
US11568269B2 (en) | Scheduling method and related apparatus | |
US20200104167A1 (en) | Data processing apparatus and method | |
CN105393240B (en) | Method and apparatus with the asynchronous processor for aiding in asynchronous vector processor | |
US20190042915A1 (en) | Procedural neural network synaptic connection modes | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
WO2022063247A1 (en) | Neural architecture search method and apparatus | |
CN103701900B (en) | Data distribution method on basis of heterogeneous cluster | |
CN110210610A (en) | Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment | |
CN108416433A (en) | A kind of neural network isomery acceleration method and system based on asynchronous event | |
CN105518625A (en) | Computation hardware with high-bandwidth memory interface | |
TWI832000B (en) | Method and system for neural networks | |
CN112866059A (en) | Nondestructive network performance testing method and device based on artificial intelligence application | |
KR20140007004A (en) | Parallel generation of topics from documents | |
CN109978129A (en) | Dispatching method and relevant apparatus | |
CN108694089A (en) | Use the parallel computation framework of non-greedy dispatching algorithm | |
Gao et al. | Deep neural network task partitioning and offloading for mobile edge computing | |
CN110502544A (en) | Data integration method, distributed computational nodes and distributed deep learning training system | |
CN110147249A (en) | A kind of calculation method and device of network model | |
CN109978149A (en) | Dispatching method and relevant apparatus | |
CN109976809A (en) | Dispatching method and relevant apparatus | |
CN109976887A (en) | Dispatching method and relevant apparatus | |
CN114691372A (en) | Group intelligent control method of multimedia end edge cloud system | |
CN110019243A (en) | Transmission method and device, equipment, the storage medium of data in Internet of Things | |
TW201931216A (en) | Integrated circuit chip device and related products comprise a compression mapping circuit for executing the compressing processing of each of the data; the main processing circuit for executing each successive operation in the neural network operation, etc. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |