CN109272109A

CN109272109A - The instruction dispatching method and device of neural network model

Info

Publication number: CN109272109A
Application number: CN201811276880.XA
Authority: CN
Inventors: 李军; 李建军; 黄畅
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-01-25
Anticipated expiration: 2038-10-30
Also published as: CN109272109B

Abstract

Disclose the instruction dispatching method and device of a kind of neural network model, it include: when needing to run the first instruction sequence of corresponding first nerves network model, second instruction sequence of the determination correspondence nervus opticus network model to be run, the first nerves network model are run prior to the nervus opticus network model；At least one in second instruction sequence is selected to instruct；At least one instruction is inserted into first instruction sequence；And first instruction sequence of the operation comprising at least one instruction.The application can improve the overall execution efficiency of multiple neural network models under the premise of not increasing hardware resource.

Description

The instruction dispatching method and device of neural network model

Technical field

This application involves the technical field of artificial neural network more particularly to a kind of instruction dispatching parties of neural network model Method and device.

Background technique

In certain specific application scenarios (for example, automatic Pilot, recognition of face), need to run multiple neural network moulds Type could obtain required result.And for needing to run multiple neural network models come the case where obtaining required result, how Assembly line is established between the instruction sequence of this multiple neural network model, reduces the idle and waste of hardware resource, thus The overall execution efficiency of multiple neural networks is improved under the premise of not increasing hardware resource as far as possible, is not yet proposed at present effective Solution.

Summary of the invention

In order to solve the above-mentioned technical problem, the application is proposed.Embodiments herein provides a kind of neural network mould The instruction dispatching method and device of type.

According to the one aspect of the application, a kind of instruction dispatching method of neural network model is provided, comprising:

When needing to run the first instruction sequence of corresponding first nerves network model, determination the second mind of correspondence to be run The second instruction sequence through network model, the first nerves network model are run prior to the nervus opticus network model；

At least one in second instruction sequence is selected to instruct；

At least one instruction is inserted into first instruction sequence；And

First instruction sequence of the operation comprising at least one instruction.

According to further aspect of the application, a kind of electronic equipment is provided, comprising: one or more processors；And Memory, is stored with computer instruction, and the computer instruction executes the processor when being run by the processor State the instruction dispatching method of neural network model.

According to further aspect of the application, a kind of instruction dispatching device of neural network model is provided, comprising:

Instruction sequence determination unit is configured in the first instruction sequence for needing to run corresponding first nerves network model When, the second instruction sequence of the determination correspondence nervus opticus network model to be run, the first nerves network model is prior to institute State the operation of nervus opticus network model；

Selecting unit is instructed, is configured to select at least one in second instruction sequence to instruct；

Instruction insertion unit is configured at least one instruction being inserted into first instruction sequence；And

Running unit is instructed, first instruction sequence of the operation comprising at least one instruction is configured to.

In addition, the application also provides a kind of computer readable storage medium, computer program instructions are stored thereon with, it is described Computer program instructions make the processor execute the instruction scheduling such as above-mentioned neural network model when being run by processor Method.

It, can be between the instruction sequence of multiple neural network models by the exemplary method and apparatus according to the application Assembly line is established, and then reduces the idle and waste of hardware resource, more flowing water executive capability for playing hardware resource, not The overall execution efficiency of multiple neural networks is improved under the premise of increasing hardware resource.

Detailed description of the invention

The embodiment of the present application is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purposes of the application, Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present application, and constitutes explanation A part of book is used to explain the application together with the embodiment of the present application, does not constitute the limitation to the application.In the accompanying drawings, Identical reference label typically represents same parts or step.

Fig. 1 is the schematic diagram of the instruction sequence built-in command scheduling of neural network model in the related technology.

Fig. 2 is the overall operation flow diagram of three neural network models in the related technology.

Fig. 3 is the system construction drawing that the application is applicable in.

Fig. 4 is the heterogeneous network example architecture figure of system shown in Figure 3.

Fig. 5 is the flow diagram for the neural network model instruction dispatching method that one exemplary embodiment of the application provides.

Fig. 6 is that the instruction in an instruction sequence is inserted into showing for another instruction sequence in one exemplary embodiment of the application The schematic diagram of example property process.

Fig. 7 is the exemplary implementation of the instruction scheduling for the neural network model that one exemplary embodiment of the application provides Schematic diagram.

Fig. 8 is the finger using neural network model provided by the embodiments of the present application that one exemplary embodiment of the application provides Enable the illustrative diagram of the overall execution process of three neural network models after dispatching method.

Fig. 9 is the instruction tune that the system as shown in Figure 4 that one exemplary embodiment of the application provides executes neural network model The schematic diagram of the example process of degree method.

Figure 10 is the one exemplary of the instruction dispatching device for the neural network model that one exemplary embodiment of the application provides Structural schematic diagram.

Figure 11 is another example of the instruction dispatching device for the neural network model that one exemplary embodiment of the application provides Property structural schematic diagram.

Figure 12 is the structure chart for the electronic equipment that one exemplary embodiment of the application provides.

Specific embodiment

In the following, example embodiment according to the application will be described in detail by referring to the drawings.Obviously, described embodiment is only It is only a part of the embodiment of the application, rather than the whole embodiments of the application, it should be appreciated that the application is not by described herein The limitation of example embodiment.

Application is summarized

In certain application scenarios, needs to run multiple neural network models just and can get required result.For example, in people Face identification in, first call a neural network model detection piece image in whether the face image containing someone, if the face of someone Portion's image, then dispatch another neural network model and the face image of people in the width image is identified, needed for final acquisition As a result.It if there is multiple image, needs that multiple neural network models is called to be detected simultaneously, equally, if multiple image In someone face image, then being also required to that multiple neural network models is called to be identified.For another example, in automatic Pilot In, then the processing such as need a variety of neural network models to detect the image acquired, identify in real time, it is just available most to terminate Fruit.As it can be seen that need to run multiple neural network models just can get needed for result scene it is very common.In these scenes such as The overall execution efficiency what improves this multiple neural network model is very crucial.

In pipelining, the flowing water executive capability of hardware resource is strongly depend on instruction sequence, and instruction scheduling is more The flowing water that reasonable instruction sequence can more play hardware resource under the premise of ensureing operation result correctness executes energy Power reduces the idle and waste of hardware resource.In the related technology, since the instruction sequence of each neural network model is to run Previous existence at, and compiler before operation can only to be compiled as unit of neural network model, this mean that compiler only It can be instructed in the instruction sequence internal schedule of single Neural model to realize the assembly line inside instruction sequence, and cannot be Dispatch command between the instruction sequence of multiple neural network models, also just can not multiple neural network models instruction sequence it Between establish assembly line.

Neural network model cannot get started operation when bringing into operation, but need first to load data.Due to Assembly line can not be established between the instruction sequence of multiple neural networks, therefore, in adding for each neural network model incipient stage Computing unit (for example, BPU) can only wait during load, and which results in the reductions of the waste of hardware resource and execution efficiency.Fig. 1 The example of the instruction sequence built-in command scheduling of neural network model in the related technology is shown, by Fig. 1 as it can be seen that although instruction The parallel partial operating time to cover the instruction sequence between can instructing by interior sequences is dispatched in the instruction of interior sequences, but Each neural network model can not be covered in the load time of incipient stage, and BPU can only be waited during this period, not only be caused hard The waste of part resource, and also reduce the overall execution efficiency of this multiple neural network model.

As a whole, the implementation procedure of each neural network model can simplify for " the load data of incipient stage -> Calculating -> final stage storage result in intermediate stage ".When thering are multiple neural network models to need to run, due to phase Pass technology can not establish assembly line between the instruction sequence of multiple neural network models, therefore, this multiple neural network model It needs to execute one by one, implementation procedure is: first carrying out first neural network model, first neural network model is waited to terminate, Second neural network model is executed again, and so on.Fig. 2 shows the overall operation processes of three neural network models to show Example, wherein the operational process of three neural network model entirety is: first carrying out the load of neural network model 1, calculates and deposit Storage executes the load, calculating and storage of neural network model 2, in neural network mould again after neural network model 1 executes Type 2 executes the load, calculating and storage of neural network 3 again after executing, this whole service process needs 3*3=9 time Section, and there is 3 periods (i.e. period 1, period 4 and period 7) BPU to be in idle state in this 9 periods, it is whole Body execution efficiency is very low.

For for some application scenarios, need frequently to call numerous nerves relatively simple for structure to obtain required result Network model (for example, neural network model of only three to four layers convolution), due to can not be in this multiple neural network model Assembly line is established between instruction sequence, therefore, as soon as every calling time neural network model, BPU have to wait for once loading, this Sample, BPU can be in idle state for a long time, and the overall operation efficiency of this corresponding multiple neural network model is also very low. According to statistics, the load time of these neural network model incipient stages relatively simple for structure accounts for its whole service duration 30%-50%, i.e. neural network model of every calling, 30%-50% of the BPU in the neural network model whole service duration Time in be in idle state, that is to say, that by needed for calling these neural network models to obtain during result, BPU is in idle state in at least runing time of 30%-50%, this necessarily causes the entirety of these neural network models to be held Line efficiency is very low.

Therefore in the case where required result could be obtained by needing to run multiple neural network models, how at this Assembly line is established between the instruction sequence of multiple neural network models, the idle and waste of hardware resource is reduced, thus not increasing The overall execution efficiency for improving multiple neural networks under the premise of adding hardware resource as far as possible is that technology urgently to be resolved is asked Topic.

In view of the above technical problems, the basic conception of the application is the instruction dispatching party for proposing a kind of neural network model Method, device, electronic equipment and computer readable storage medium, in the first instruction for needing to run corresponding first nerves network model When sequence, the second instruction sequence of the determination correspondence nervus opticus network model to be run, the first nerves network model elder generation It is run in the nervus opticus network model；At least one in second instruction sequence is selected to instruct；By described at least one Item instruction is inserted into first instruction sequence；And first sequence of instructions of the operation comprising at least one instruction Column.The embodiment of the present application, at least one instruction scheduling in the second instruction sequence by the way that nervus opticus network model will be corresponded to Into the first instruction sequence of corresponding first nerves network model, so that at least one instruction of nervus opticus network model can With the parallel instructions in first nerves network model, flowing water just also is established between the first instruction sequence and the second instruction sequence Line can more play the flowing water executive capability of hardware resource under the premise of ensuring that operation result is correct, and then reduce hard The idle and waste of part resource (for example, BPU), improves the whole of multiple neural network models under the premise of not increasing hardware resource Body execution efficiency.The case where especially for needing frequently to call numerous neural network models relatively simple for structure, pass through this Application embodiment can greatly improve the overall execution efficiency of these neural network models under the premise of not increasing hardware resource.

It should be noted that although above by taking the application scenarios of numerous neural network models relatively simple for structure as an example into Row explanation, but the scope of application of the embodiment of the present application is without being limited thereto.For needing to run two or more neural networks Any scene of model, the embodiment of the present application are applicable.For example, for the neural network mould for needing multiple structures more complicated The scene of type, the embodiment of the present application are still applicable in.

Exemplary system

Embodiments herein is applicable to any system for supporting multiple neural network model operations, which can be Isomerism network structure, or homogeneous network structure.

Fig. 3 is the exemplary structure 30 of the system of above-mentioned support multiple neural network models operation, comprising: be connected with each other or The compiling equipment 301 and running equipment 302 of communication, compiling equipment 301 are responsible for compiling (i.e. under off-line state) before runtime each The instruction sequence of a neural network model, running equipment 302 are responsible for each neural network model that operation compiling equipment provides Instruction sequence.Here, compiling equipment 301 can be realized by one or more processors, and the operation of these processors has compiler, These processors can be realized by the CPU of epistasis energy in practical application；Running equipment 302 may include one or more processing Device, one or more in these processors can be used for realizing the relevant calculating of neural network, these calculating include but is not limited to: Convolution, the calculating of activation primitive, pond etc..

Fig. 4 is the isomerism network structure example 40 of system shown in Figure 3, wherein first processor 401 belongs to above-mentioned compiling and sets Standby 301, memory 403, second processor 402 and third processor 404 belong to above-mentioned running equipment 302.Wherein, at first Reason device 401 is responsible for compiling the instruction sequence of each neural network model before operation, and second processor 402 is responsible at runtime will be each The instruction sequence of neural network model is loaded into memory 403, and memory 403 is responsible for what storage at runtime was waited for The instruction sequence of each neural network model, third processor 404 are responsible for reading neural network mould from memory 403 at runtime The instruction sequence of type simultaneously runs the instruction sequence.Here, first processor 401 can be the CPU of epistasis energy, configured with compiling Device；Second processor 402 can be weak performance arm processor, third processor 404 can for brain processor (BPU, Brain Processing Unit), tensor processing unit (TPU, Tensor Processing Unit) etc. support neural network The processor of relevant calculation, memory 403 can for memory (such as DDR) or nonvolatile memory (such as hard disk, SSD, Flash, EEPROM etc.).

It should be noted that above-mentioned Fig. 3 and Fig. 4 are merely illustrative, the applicable system of the embodiment of the present application is without being limited thereto.This Shen It please the embodiment arbitrary system that can be applied to support two or more neural network models to run.

Illustrative methods

Fig. 5 is the illustrative methods 500 of the instruction scheduling for the neural network model that one exemplary embodiment of the application provides. As shown in figure 5, the illustrative methods 500 include the following steps:

Step 501, when needing to run the first instruction sequence of corresponding first nerves network model, determination pair to be run The second instruction sequence of nervus opticus network model is answered, the first nerves network model is prior to the nervus opticus network model Operation；

Step 502, at least one instruction in second instruction sequence is selected；

Step 503, at least one instruction is inserted into first instruction sequence；And

Step 504, first instruction sequence of the operation comprising at least one instruction.

In the embodiment of the present application, when needing to run the first instruction sequence of first nerves network model, by by second Neural network model at least one instruction be inserted into the instruction sequence of first nerves network model, the first instruction sequence with Assembly line is established between second instruction sequence, at least one instruction of nervus opticus network model can be with first nerves network Parallel instructions in model, the flowing water that hardware resource has greatly been played under the premise of ensuring that operation result is correct execute Ability, saves the overall operation time of this multiple neural network model, and then reduces the spare time of hardware resource (for example, BPU) It sets and wastes, the overall execution efficiency of multiple neural network models is improved under the premise of not increasing hardware resource.For needing The case where frequently calling numerous neural network models relatively simple for structure, hardware can not increased by the embodiment of the present application The overall execution efficiency of these neural network models is greatly improved under the premise of resource.

The embodiment of the present application (is in etc. in step 501 in the first instruction sequence when needing to run the first instruction sequence When state) determine the second instruction sequence to be run after which, it will be transported to be directed in step 502~step 503 Capable the first instruction sequence and the second instruction sequence followed by carry out the instruction scheduling between instruction sequence, thus according to more The actual motion sequence of a neural network model establishes the assembly line between multiple instruction sequence, so that this multiple instruction sequence Between assembly line will not influence the integral operation results of multiple neural network models, it is ensured that operation result is correct.

In the embodiment of the present application, the determination of the second instruction sequence in step 501, the fortune depending on first nerves network model Calculate the result finally needed under result, and/or current scene.

In a kind of implementation of the embodiment of the present application, it can be moved when calling each neural network model in step 501 Next which neural network model state determination runs, thus can determine which the second instruction sequence to be run is specifically. Specifically, it can be connect according to the result that the operation result of first nerves network model and current scene finally need to determine Need to call which neural network model (i.e. which nervus opticus network model is), which neural network called in determination It has been determined when model and has needed which instruction sequence run after the first instruction sequence.By taking recognition of face as an example, neural network Model 1 (example of first nerves network model described herein) is used to detect in original image whether the face of someone to scheme Picture, neural network model 2 (example of nervus opticus network model described herein) are used for the face to people in original image Portion's image identified, 1 corresponding instruction sequence 1 (example of the first instruction sequence described herein) of neural network model, 2 corresponding instruction sequence 2 of neural network model, then, if the testing result of neural network model 1 is someone in original image Face image then needs to continue that neural network model 2 is called to be identified, can confirm need after instruction sequence 1 at this time Operating instruction sequence 2.If the result of neural network model 1 is the face image that people is not present in original image, do not need Continue that neural network model 2 is called to be identified, i.e., no longer needs operating instruction sequence 2 after instruction sequence 1, it can be direct Terminate the processing of the original image.If detected simultaneously to 2000 width original images, it would be possible that needing to run 2000 Instruction sequence 1 respectively detects the 2000 width original image, if detect the face of someone in 1000 width original images Portion's image, it would be possible that need while 1000 neural network models 2 being called to come respectively to people in this 1000 width original image Face image is identified, correspondingly, needing to run 1000 instruction sequences 2 after this 2000 instruction sequences 1.

In another implementation of the embodiment of the present application, it can be set in advance according to the demand of application scenarios in step 501 The calling sequence of fixed multiple neural network models, calling sequence indicate the instruction sequence of this multiple neural network model Operation order.

It should be noted that above two implementation is merely illustrative, in the embodiment of the present application in step 501 it is specific how Which instruction sequence determines will run after the first instruction sequence, can there are many implementations, the embodiment of the present application not to limit System.

In some implementations of the embodiment of the present application, at least one instruction is for loading the nervus opticus net The load of data needed for network model calculation instructs.The load instruction may include characteristic load instruction and/or weight, partially The load of the parameters such as shifting amount instructs.In a kind of implementation, which can start for nervus opticus network model The load in stage instructs, i.e. load instruction in the second instruction sequence before first computations.In this way, can be by the second mind Load instruction through the network model incipient stage advances in the operational process of first nerves network model, can save the second mind Load time through the network model incipient stage reduces the waiting time of incipient stage BPU, to improve execution efficiency.At this In implementation, the computations that be inserted into first instruction sequence can be instructed by described at least one in step 503 Between.It is, of course, also possible to using " by the second instruction sequence middle section load instruction be inserted into the first instruction sequence ", The implementation of " computations of the second instruction sequence are inserted into the first instruction sequence ".In specific implementation, refer to from second It enables and which or which instruction is selected to be inserted into the first instruction sequence in sequence, depend on specific application scenarios, each nerve The instruction sequence length of network model.In this regard, the embodiment of the present application not limits.

In some implementations of the embodiment of the present application, at least one in second instruction sequence is selected in step 502 Item instruction may include: traversal second instruction sequence to search the first computations in second instruction sequence；With And the instruction before the first computations in selection second instruction sequence.In this way, can targetedly by The instruction of nervus opticus network model incipient stage is as the object dispatched between instruction sequence, so that nervus opticus network model Incipient stage can be with the part stage (for example, intermediate stage) of first nerves network model parallel, to save the second mind The execution time through the network model incipient stage, avoid nervus opticus network model incipient stage hardware resource (for example, BPU) fall into a long wait, and then improve the overall execution effect of multiple neural network models under the premise of not increasing hardware resource Rate.

Since the runing time of instruction depends on the size of data that the instruction needs to operate, the data for needing to operate are bigger The runing time of Shi Zhiling is long, and the runing time that the data for needing to operate instruct when smaller is shorter, therefore, the application It, can be according to the descending order of the data of instruction operation one by one by institute in step 503 in some implementations of embodiment At least one instruction is stated to be inserted into first instruction sequence.In this way, can preferentially by being run in nervus opticus network model when Between long instruction be inserted into the first instruction sequence, enable runing time is long in the second instruction sequence instruction with One or more parallel instructions in first instruction sequence, to save more runing times (for example, nervus opticus network mould The time of type incipient stage load data), and then more improve the overall execution efficiency of multiple neural network models.

In the embodiment of the present application, step 503 may include: verification at least one instruction and first instruction sequence In the instruction of each item between (can also claim with the presence or absence of data collision (alternatively referred to as dependence conflict) and/or hardware resource conflict For Structural confliction), with the determination legal position of at least one instruction in first instruction sequence, so as to will be described At least one instruction is inserted into first instruction sequence.In this way, at least one instruction is inserted into the first instruction sequence Before, legal position of at least one instruction in the first instruction sequence is determined by verifying, and can effectively avoid the production of conflict It is raw, more reasonable assembly line is established between the instruction sequence of multiple neural network models, these neural network models is avoided to transport Row calculates mistake occurs, to improve this under the premise of ensuring that this multiple neural network model integral operation result is correct The overall execution efficiency of multiple neural network models.

In some implementations of the embodiment of the present application, specifically how to pass through verification data collision and/or hardware resource punching It dashes forward to determine legal position, (the legal problem includes but not the legal problem for depending on whether to cause in execution process instruction The mistake being limited in the mistake and/or calculating in operation).In a kind of implementation, for only having data collision that can just cause to refer to The case where enabling the legal problem of implementation procedure (for example, being instructed for asynchronous obstruction), it is only necessary to data collision is verified, if do not deposited It can be confirmed as legal position in data collision, be confirmed as illegal position if there is data collision.Another realization side In formula, for (comparing the case where any legal problem that can cause execution process instruction in data collision and hardware resource conflict Such as, for synchronous obstruction instruction), needing to verify data collision and hardware resource conflict (can first verify data collision to verify again Hardware resource conflict), it can be confirmed as legal position if there is no data collision and hardware resource conflict, if there is data Illegal position is confirmed as in any one of conflict and hardware resource conflict.

In the present embodiment, an instruction may include operation code, and operation code is used to indicate which operation the instruction will execute (i.e. which task this instruction for completing), which can include but is not limited to: access evidence writes data, convolution, Chi Hua (pooling) etc., it can determine that an instruction belongs to load instruction or the computations of neural network model by operation code. In addition to this, an instruction can also include: address code and operand, and address code is used to indicate the interior of data operated by the instruction Address is deposited, operand is used to indicate this instruction will be for which object (for example, characteristic, neural computing are relevant The parameters such as weight, offset).In concrete application, type, the operation data of this instruction can be determined by one instruction of parsing Size, memory address etc..

In some implementations of the embodiment of the present application, at least one instruction and each item in the first instruction sequence are verified It whether there is data collision between instruction, may include: the memory address and first for judging that at least one instruction is accessed The memory address that the instruction of instruction sequence is accessed whether be overlapped and it is described at least one instruction and/or it is described first instruction Whether the instruction in sequence will execute write operation to the data in the memory address；The memory address overlapping and it is described at least One instruction and/or first instruction sequence in instruction by the memory address data execute write operation when, determine There are data collisions at least one instruction and the instruction of first instruction sequence.As an example it is assumed that instruction A is first Instruction in instruction sequence, instruction B is the instruction that the first instruction sequence is inserted into the second instruction sequence, if instruction A In address, a reads data and B is instructed to write data in address a, then confirms that there are data collisions by instruction A and instruction B；If instructing A In address, a reads data, and instruction B also reads data in address a, then confirms that there is no data collisions between instruction A and instruction B.? That is two instructions access the data of same address, if be not altered to data content, confirm between this two instructions There is no data collisions, if there is any bar instruction needs to be altered data content, then existing between this two instructions Data collision.Here, it instructs the memory address of access and its executes which kind of operation can be obtained by analyzing the instruction.

In a kind of implementation of the embodiment of the present application, an instruction and the first sequence of instructions in the second instruction sequence are verified It is with bit when one in column instruction first instruction of end (for example, in the first instruction sequence) whether there is data collision Unit is verified, i.e. every ratio to an instruction and an instruction operation in the first instruction sequence in the second instruction sequence Special data are verified, to determine that this two instructions whether there is data collision, once with the presence of partial data conflict (i.e. part The memory address of data is identical and needs to execute write operation to the partial data) being then considered as this two instructions, there are data collisions.

It should be noted that the mode of above-mentioned verification data collision is merely illustrative, verify at least one in the second instruction sequence It can also implement by other means between each item instruction with the presence or absence of data collision in item instruction and the first instruction sequence.For The specific embodiment of data collision is verified, the application not limits.

In some implementations of the embodiment of the present application, instruction and hardware can be pre-configured with by way of static configuration Mapping relation information between resource judges two instructions with the presence or absence of hard in verification by inquiring the mapping relation information Part resource contention.It is each at least one instruction and the first instruction sequence in above-mentioned the second instruction sequence of verification in one example It whether there is hardware resource conflict between item instruction, may include: the preconfigured each item instruction of inquiry and its used hardware Mapping relation information between resource judges at least one instruction and the first instruction sequence based on the mapping relation information Instruction whether need using same hardware resource；Determined when needing using same hardware resource it is described at least one instruction with There are hardware resource conflicts for the instruction of first instruction sequence.Described in then being determined when not needing using same hardware resource extremely Hardware resource conflict is not present in a few instruction and the instruction of first instruction sequence.Here, instruction and its used hardware Mapping relation information between resource instructs which used hardware resource has for indicating each item by mapping relations, should Mapping relations can be a variety of for one-to-one, one-to-many, many-one etc., depending on the type of instruction, the operation code of instruction and hard The concrete configuration (for example, a processor or multiple processors, a memory or multiple memories etc.) of part resource.It is logical The mapping relations between instruction and its used hardware resource are crossed, it can be to avoid hardware resource conflict.It should be noted that for The specific verification mode of hardware resource conflict, the embodiment of the present application are without being limited thereto.

In one embodiment, instruction sequence refers to the sequence comprising a plurality of instruction, and each item in the sequence is instructed according to it Sequence is carved when operation starts, and it is forward to carve preceding instruction reorder when operation starts, sorts after carving posterior instruction when operation starts Rearward.In some implementations of the embodiment of the present application, conjunction of the instruction in the first instruction sequence in the second instruction sequence Method position, which refers to be inserted at the position in the first instruction sequence in the second instruction sequence, will not influence first after an instruction The operation result of instruction sequence, and in the second instruction sequence this instruction with the first instruction sequence in sort in the position it Data collision is all not present in all instructions afterwards.In one example, position of the instruction in an instruction sequence can lead to Serial number is crossed to indicate, which can indicate that this instructs the sequence in the instruction sequence.For example, an instruction is instructed at one Sequence permutation is 3, then the position of the instruction can be expressed as 3.Certainly, position of one instruction in instruction sequence when specific implementation Setting can also indicate by other means, and specific representation the embodiment of the present application not limits.Correspondingly, legal position Representation is similar with the representation of above-mentioned position.It should be strongly noted that an instruction in the second instruction sequence exists Legal position in first instruction sequence can be used as this and instruct the pluggable position in the first instruction sequence, but not necessarily It is that this instruction is actually inserted into position in the first instruction sequence.

In the embodiment of the present application, legal position of the instruction in the first instruction sequence in the second instruction sequence may Do not have, it is also possible to which there are one or more.If an instruction in the second instruction sequence is legal in the first instruction sequence Position has multiple, can therefrom select insertion position of the legal position as the instruction.If in the second instruction sequence One legal position of the instruction in the first instruction sequence only one, can insertion position using the legal position as the instruction It sets.If legal position is not present at least one instruction in the second instruction sequence in the first instruction sequence, fractionation is described extremely A few instruction is two groups of instructions, data collision will be not present with the instruction in first instruction sequence in two groups of instructions One group of instruction be inserted into first instruction sequence.

In the embodiment of the present application, if split the second instruction sequence at least one instruction, depend on this at least one There is no the sizes of the data of conflict in for instruction operands.In some implementations, it can be determined that at least one instruction There is no the sizes of the data of conflict whether to be greater than preset threshold value in operation data；In at least one instruction operation When being greater than the threshold value there is no the size of the data of conflict in data, at least one instruction is split.Described at least one When instruction operands are less than or equal to the threshold value there is no the size of the data of conflict in, can not split it is described at least One instruction.The threshold value can be preset, and the value of the threshold value can be empirical value, specific value and actual applied field Scape is related, and the application not limits.In one example, it is described at least one instruction for load instruction when, according to statistics every plus The preceding N byte data for carrying instruction operation occupy most runing time, at this point, above-mentioned threshold value can be set based on N, than Such as, above-mentioned threshold value can be taken as the multiple (for example, 2N) of N, and here, N is the integer not less than 1.In a kind of implementation, if One in second instruction sequence instruction belong to the nervus opticus network model incipient stage (for example, i.e. first computations it Preceding load instruction) and its operation data in there is no the data of conflict, larger (for example, in the data of instruction operation, there is no punchings The size of prominent data has been more than above-mentioned threshold value), it can be instructed according to this and be rushed with the data instructed in the first instruction sequence This instruction is split as two groups of instructions by prominent situation, and first group of instruction is deposited with the instruction in the first instruction sequence in two groups of instructions Data collision is not present in instruction in data collision, second group of instruction and the first instruction sequence, and second group of instruction is inserted into In first instruction sequence.In this way, an instruction in the second instruction sequence can pass through fractionation there is no when legal position Mode is partially inserted into the first instruction sequence, so that the runing time of nervus opticus network model is further saved, Improve the overall execution efficiency of multiple neural network models.

In a kind of implementation of the embodiment of the present application, according to the instruction and the first sequence of instructions in the second instruction sequence The data collision situation instructed in column splits an instruction in the second instruction sequence, is specifically as follows: (can use data Conflict the result of verification) determine that one in the second instruction sequence instruction conflicts with existing between the instruction in the first instruction sequence Data and there is no the data of conflict (for example, can directly using data collision verification result come determine which be exist The data of conflict, which is that there is no the data of conflict)；And according to it is described exist conflict data and there is no conflicts At least one instruction is split as two groups of instructions by data, and first group of instruction in two groups of instructions (may include one Or a plurality of instruction) for operating the data (for example, instruction for loading the data in the presence of conflict) that there is conflict, it is described Second group of instruction (may include one or more instruction) in two groups of instructions is used to operate the data (ratio that conflict is not present Such as, for loading the instruction that the data of conflict are not present).Optionally, there will be no the big of the data of conflict before splitting instruction Small and above-mentioned threshold value comparison, when being greater than above-mentioned threshold value there is no the size of the data of conflict, according to the number for existing and conflicting According to there is no conflict data, by it is described at least one instruction be split as two groups of instructions.In this way, first group of instruction refers to first Enabling above-metioned instruction in sequence, there are data collisions, but there is no numbers for the above-metioned instruction in second group of instruction and the first instruction sequence According to conflict, in this way, second group of instruction can be inserted into the first instruction sequence, also i.e. by above-mentioned one in the second instruction sequence It is inserted into the first instruction sequence to operation part.In the embodiment of the present application, instructed by splitting in the second instruction sequence Mode by runing time in the second instruction sequence it is longer instruction be inserted into the first instruction sequence, be conducive to save more fortune Row time, the overall execution efficiency of the multiple neural network models of raising under the premise of not increasing hardware resource by a larger margin.

In one example, the operation data of an instruction A in the second instruction sequence is " abcdef ", passes through data collision Instruction in verification discovery instruction A and the first instruction sequence is " b ", " d " there are the data of conflict there are data collision, at this moment, Instruction A can be split as two groups of instructions, first group of instruction includes the instruction 1 of operation data " b " and the instruction of operation data " d " 2, second group of instruction includes the instruction 5 of the instruction 3 of operation data " a ", the instruction 4 of operation data " c " and operation data " ef ", is torn open / after, instruction 3, instruction 4, the instruction 5 in second group of instruction can be inserted respectively into the first instruction sequence, first group refers to Instruction 1 and instruction 2 in order are then retained in the second instruction sequence.

In a kind of implementation of the embodiment of the present application, if at least one instruction is in first instruction sequence There are two or more legal positions, it can determine that the longest legal position of parallel time is inserting at least one instruction Enter position, to save runing time as much as possible, so that the entirety for more greatly improving multiple neural network models is held Line efficiency.

In another implementation of the embodiment of the present application, if at least one instruction is in first instruction sequence It is middle there are two or more legal positions, determine parallel time longest and be near preceding legal position in the first instruction sequence The insertion position of at least one instruction, to save runing time as much as possible, to more greatly improve multiple The overall execution efficiency of neural network model.

In the embodiment of the present application, one in the second instruction sequence is instructed, one in the first instruction sequence Parallel time at legal position, the runing time depending on one or more instruction in parallel in the first instruction sequence. As one instruct runing time with its operated by the size of data it is directly related, a kind of reality of the embodiment of the present application In existing mode, the longest legal position of parallel time is determined, may include: to determine in the first instruction sequence in each legal position One or more instruction at place and at least one parallel instructions, the magnitude estimation based on the one or more instruction operands evidence The parallel time of each legal position；The parallel time of each legal position based on estimation, determines parallel time longest Legal position.In this way, it is estimated that an instruction in the second instruction sequence can be covered really at each legal position The runing time of lid, the runing time that the runing time really covered more more piece saves is also more, and then can be more to save The legal position of runing time is actually inserted into position as what this instructed, to more greatly improve multiple neural network moulds The overall execution efficiency of type.

A legal position of the instruction in the first instruction sequence in the embodiment of the present application, in the second instruction sequence The parallel time at place is equal in the first instruction sequence at the legal position and a parallel instructions in the second instruction sequence The sum of the runing time of one or more instruction.That is, being instructed at legal position with second in the first instruction sequence of estimation The runing time of one or more instruction of a parallel instructions in sequence is the finger that can determine in the second instruction sequence Enable the parallel time at a legal position in the first instruction sequence.Here, it for load instruction, reads as needed The size for evidence of fetching estimates runing time, and in general, the data to be read are bigger, and its runing time is longer, read data Size and runing time can be linear relationship, the linear relationship can by collect each instruction actual run time and its The size of actual read data, and these for statistical analysis are obtained.It, can be according to its calculating for computations The size of data estimates runing time, and in general, the more big corresponding computations runing time of the data to be calculated is also It is longer, it is usually nonlinear relationship between the runing time of computations and its practical size of data calculated, this is non-linear Relationship can also be by collecting the actual run time and its size of real arithmetic data and for statistical analysis of computations To obtain.For overview, the runing time of an instruction can be determined according to the size of data that it need to be operated.

It, can be since the end of first instruction sequence for one by one in a kind of implementation of the embodiment of the present application It is verified described in instruction execution.For an instruction in the second instruction sequence, school is executed since the end of the first instruction sequence It tests, being conducive to find this instruction, sequence is near preceding legal position in the first instruction sequence, thus to sort near preceding Legal position instructs the insertion position in the first instruction sequence as this, to save runing time as far as possible, improves The overall execution efficiency of multiple neural network models.

Below an example come illustrate above-mentioned neural network model instruction dispatching method specific implementation process.

Neural network model 1 (example of first nerves network model described herein) (this Shen of corresponding instruction sequence 1 One example of the first instruction sequence that please be described), instruction sequence 2 (example of the second instruction sequence described herein) is corresponding Neural network model 2 (example of nervus opticus network model described herein), neural network model 1 is prior to neural network Model 2 executes.As shown in fig. 6, by load instruction (first calculating i.e. in instruction sequence 2 of 2 incipient stage of neural network model Load instruction before instruction) it is inserted into the exemplary flow 600 of instruction sequence 1, it may include steps of:

Step 601, instruction sequence 2 is traversed, finds first computations in instruction sequence 2 (for example, the first silver lap Product instruction), and all instructions (i.e. all loads instruct) before first computations in instruction sequence 2 is extracted Come, forms candidate instruction collection；

Step 602, each item instruction is concentrated to determine that each item instructs the size of operated data by parsing candidate instruction, and will Each item instruction that the candidate instruction is concentrated is arranged according to the descending sequence of its operation data；

Step 603, it is concentrated from candidate instruction and reads sequence near a preceding instruction, if reading failure, shows to wait It selects instruction set for sky, terminates current process, if read successfully, showing candidate instruction collection not is empty, continuation step 604；

Here, suppose that the instruction currently taken out is instruction A；

Step 604, insertion position of the determine instruction A in instruction sequence 1, if instruction A is not closed in instruction sequence 1 Step 606 is then continued in the insertion position of method, continues step if instruction A has legal insertion position in instruction sequence 1 605；

Step 605, the inserting instruction A at the insertion position in instruction sequence 1, and return step 603, continue that candidate is taken to refer to It enables and concentrates next instruction.

Step 606, instruction A is put back into instruction sequence 2 before first computations.

In one example, it is assumed that comprising instruction B, instruction C and this three instructions of instruction D in instruction sequence 1, then step The exemplary flow of insertion position of the determine instruction A in instruction sequence 1 in 604, may include steps of:

Step 1, the legal position record of initialization directive A, and searched for since first instruction at 1 end of instruction sequence

Here, suppose that the instruction that current search arrives is instruction B.

Here, the legal position record of initialization directive A includes: the legal position record for generating instruction A, and this is legal The value of parameters uses as default in the record of position, which is invalid value.Wherein, in legal position record Parameter may include: when instructing parallel in newest legal position, newest legal position inserting instruction A of instruction sequence 1 of A Clock periodicity and indicate the newest legal position with the presence or absence of hardware resource conflict first label.

Step 2, it whether there is data collision between checking command B and instruction A, if it is, determining that instruction A is being instructed Legal position is not present in sequence 1, current process terminates；Otherwise, continue step 3；

Step 3, it whether there is hardware resource conflict between checking command B and instruction A, if it is, continue step 5, it is no Then continue step 4；

Step 4, newest legal position of the position as instruction A before B is instructed in determine instruction sequence 1, is estimated at this The legal position of parallel clock cycle number m when newest legal position inserting instruction A, more new command A record, and continue step 5；

Here, the legal position record of more new command A, comprising: A will be instructed in the newest legal position of instruction sequence 1 It is updated to the position before instruction B；The parallel clock cycle number for instructing A is updated to parallel clock cycle number m；And by Hardware resource conflict is not present in position before the value of one label is updated to instruction B.

Here, which is a kind of illustrative representation of parallel time described above.It is practical In, however not excluded that have other representations.The representation of parallel time is without being limited thereto in the embodiment of the present application.

Here, the process for estimating parallel clock cycle number m may include: to determine that the position before A inserting instruction B will be instructed When instruction sequence 1 in one or more instruction parallel with instruction A, operated according to instruction parallel with instruction A in instruction sequence 1 Data estimation its runing time (for example, clock periodicity of operation), by one parallel with instruction A in instruction sequence 1 or The sum of runing time (for example, clock periodicity of operation) of a plurality of instruction is determined as parallel cycle number m.

Step 5, whether decision instruction B is that sequence is near preceding instruction in instruction sequence 1, if it is, continue step 14, Otherwise the Article 2 instruction C at instruction sequence end is continued searching；

Step 6, it whether there is data collision between checking command C and instruction A, if it is, continuing step 14；Otherwise, Continue step 7；

Step 7, it whether there is hardware resource conflict between checking command C and instruction A, if it is, continue step 14, it is no Then continue step 8；

Step 8, the parallel clock cycle number n when position inserting instruction A instructed before C in instruction sequence 1 is estimated, it will Parallel clock cycle number n is compared with the parallel clock cycle number m in the legal position record of instruction A, if parallel clock cycle Number n is greater than or equal to parallel clock cycle number m, it is determined that the newest legal position that the position before instruction C is instruction A, and phase The legal position of the more new command A answered records；If parallel clock cycle number n is less than parallel clock cycle number m, keep instructing The legal position of A records constant；Continue step 9；

Here, estimate that the process of parallel clock cycle number n is similar to the above estimation process of parallel clock cycle number m, It repeats no more.

Here, the legal position record of more new command A, comprising: A will be instructed in the newest legal position of instruction sequence 1 It is updated to the position before instruction C；The parallel clock cycle number for instructing A is updated to parallel clock cycle number n；And by Hardware resource conflict is not present in position before the value of one label is updated to instruction C.

Step 9, whether decision instruction C is that sequence is near preceding instruction in instruction sequence 1, if it is, continue step 14, Otherwise the Article 3 instruction D at instruction sequence end is continued searching；

Step 10, it whether there is data collision between checking command D and instruction A, if it is, continuing step 14；Otherwise, Continue step 11；

Step 11, it whether there is hardware resource conflict between checking command D and instruction A, if it is, continue step 13, Otherwise continue step 12；

Step 12, the parallel clock cycle number k when position inserting instruction A instructed before D in instruction sequence 1 is estimated, It (may be parallel clock cycle number by the parallel clock cycle number in the legal position record of parallel clock cycle number k and instruction A M, it is also possible to which parallel clock cycle number n) compares, if parallel clock cycle number k is greater than or equal in legal position record Parallel clock cycle number, it is determined that the newest legal position that the position before instruction D is instruction A, and corresponding more new command A Legal position record；If parallel clock cycle number k is less than the parallel clock cycle number in legal position record, keep instructing The legal position of A records constant；Continue step 13；

Here, estimate that the process of parallel clock cycle number k is similar to the process of above-mentioned estimation parallel clock cycle number m, no It repeats again.

Here, the legal position record of more new command A, comprising: A will be instructed in the newest legal position of instruction sequence 1 It is updated to the position before instruction D；The parallel clock cycle number for instructing A is updated to parallel clock cycle number k；And by Hardware resource conflict is not present in position before the value of one label is updated to instruction D.

Step 13, decision instruction D is that sequence continues step 14 near preceding instruction in instruction sequence 1；

Step 14, read instruction A legal position record, judge legal position record in parameters value whether For invalid value, if not then being determined as the newest legal position in the legal position for instructing A record to instruct A in instruction sequence Insertion position in 1, if it is determine instruction A does not have legal insertion position in instruction sequence 1, and current process terminates.

It should be noted that above-mentioned process is only an exemplary realization of the insertion position in instruction sequence 1 determine instruction A Mode, the embodiment of the present application are without being limited thereto.

Alternatively, there is no the larger (ratios of data operated by legal position and instruction A in instruction A and instruction sequence 1 It such as, has been more than preset threshold value with the size that the data to conflict are not present in instruction in instruction sequence 1 in the data of instruction A operation When) when, instruction A is split as two groups of instructions, an instruction in two groups of instructions in first group of instruction and instruction sequence 1 (should Item instruction may be any one instruction in instruction sequence 1) there are data collision, second group of instruction with it is upper in instruction sequence 1 Instruction is stated there is no data collision, first group of instruction is put back into instruction sequence 2, it, can be by for second group of instruction One or more insertion position of the instruction in instruction sequence 1 in second group of instruction is determined according to above-mentioned process, and by second One or more instruction in group instruction is inserted into instruction sequence 1.In this way, the load instruction in instruction sequence 2 is not It there are when legal position, can be partially inserted by way of fractionation in instruction sequence 1, to further save mind Load time through 2 incipient stage of network model improves the overall execution efficiency of multiple neural network models.

Fig. 7 show the instruction dispatching method using neural network model provided by the embodiments of the present application instruction sequence it Between carry out the implementation procedure example of instruction scheduling, in the example, (one of the first instruction sequence in the application shows instruction sequence a Example) and instruction sequence b (example of the second instruction sequence in the application) respectively correspond neural network model a (in the application First nerves network model an example) and neural network model b (that is, one of the nervus opticus network model in the application Example), neural network network a is run prior to neural network model b, can will be neural by the above method of the embodiment of the present application The characteristic load instruction " LD-Feature " of network model b incipient stage and weight load instruction " LD Weight " insertion To between the convolutional calculation instruction " CONV " of instruction sequence a, " LD- is instructed in this way, the characteristic of instruction sequence b is loaded Feature " and weight load instruction " LD Weight " advance to instruction sequence a and instruct with the convolutional calculation of instruction sequence a " CONV " parallel, establishes assembly line between instruction sequence a and instruction sequence b, it starts before running neural network model b The characteristic load instruction " LD-Feature " in stage and weight load instruction " LD Weight " have executed completion, nerve The part load time of network model b incipient stage is blanked, and does not need for a long time in neural network model b incipient stage BPU It waits, to greatly improve the entirety of neural network model a and neural network model b under the premise of not increasing hardware resource Execution efficiency.

Fig. 8 shows three nerves after the instruction dispatching method using neural network model provided by the embodiments of the present application The overall operation process of network model, wherein in neural network model 1, the finger of neural network model 2 and neural network model 3 It enables and establishes assembly line between sequence, execute the load of executable neural network model 2 when the calculating of neural network model 1, The calculating for executing executable neural network model 2 when " storage result " of neural network model 1, executes neural network model 2 Calculating when executable neural network model 3 load, execute executable refreshing when " storage result " of neural network model 2 Calculating through network model 3, this whole service process needs 5 periods can be completed, compared to the relevant technologies shown in Fig. 2 For, overall execution efficiency significantly improves under the premise of not increasing hardware resource.

In the embodiment of the present application, above-mentioned example side can be executed for the instruction sequence of multiple neural network models simultaneously The processing of method, can also the instruction sequence to multiple neural network models one by one execute the processing of above-mentioned example method, this Depending on the configuration of hardware resource and the operation order of neural network model.For example, only one operation neural network mould When processor (for example, only one BPU) of type, an instruction sequence can only be run every time, then can only be according to neural network The operation order of model one by one executes the processing of above-mentioned example method to the instruction sequence of multiple neural network models, with Assembly line is established between the instruction sequence of this multiple neural network model.There is the processor (ratio of multiple operation neural network models Such as, it is furnished with multiple BPU) when, multiple neural network models can be run simultaneously, it accordingly can be according to the fortune of neural network model Row sequence is performed simultaneously the processing of above-mentioned example method to the instruction sequence of this multiple neural network model, to run at the same time The instruction sequences of multiple neural network models establish assembly line.

In a kind of implementation of the embodiment of the present application, the of the determination correspondence nervus opticus network model to be run The step 502 of at least one instruction in the steps 501 of two instruction sequences, selection second instruction sequence and will be described At least one instruction is inserted into the second processor that the step 503 in first instruction sequence can as shown in Figure 4 in system It executes；First instruction sequence and the second instruction sequence can as shown in Figure 4 in system first processor compiling.Here, The first processor and the second processor are different types of processors.Institute of the operation comprising at least one instruction State the first instruction sequence step 504 can the third processor of system as shown in Figure 4 execute.

Fig. 9 shows the example process that system shown in Figure 4 executes the illustrative methods of the embodiment of the present application, this is exemplary In the process, instruction sequence (the corresponding instruction of a neural network model of each neural network model is compiled in first processor Sequence) and be supplied to after second processor, second processor executes the application according to the operation order of each neural network model The processing of embodiment illustrative methods is simultaneously loaded into memory, and third processor reads instruction sequence or instruction from memory Instruction and operation in sequence.Wherein, the process of loading may include: to write the instruction sequence of neural network model to be run Enter memory (for example, DDR) and by the length of its initial address (that is, first address) in memory and the instruction sequence (i.e. Instructed including how many) it is supplied to third processor, so that third processor reads instruction sequence from memory.One example In, second processor can execute the illustrative methods of the embodiment of the present application as follows: according to neural network model Operation order determines next instruction sequence to be run (being assumed to be instruction sequence E), the instruction compiled from first processor Instruction sequence E is extracted in sequence, and instruction sequence E is put into caching, instruction sequence E is waited in the buffer, At this point, second processor executes above-mentioned example to the instruction sequence (being assumed to be instruction sequence D) before instruction sequence E and its At least one instruction in instruction sequence E is dispatched in instruction sequence D, then loads instruction sequence D by the processing of method To memory, so that third processor reads instruction sequence D from memory and runs, in this way, instruction sequence E part instruction quilt It has advanceed in instruction sequence D operational process and has executed.After instruction sequence D is loaded, second processor extracts operation order and exists Instruction sequence F after instruction sequence E is simultaneously put into caching, then at least one instruction in instruction sequence F is dispatched to sequence of instructions It arranging in E, then loads instruction sequence E to memory, third processor reads instruction sequence E and is run from memory, this Sample, instruction sequence F part instruction, which has been advanced in instruction sequence E operational process, to be executed.So sequence executes, to the last One neural network model end of run.

It, can be in the compilation phase of the instruction sequence of each neural network model, to each neural network in practical application The instruction sequence of model carries out the instruction scheduling of interior sequences (for example, can be by the first processor of system shown in Figure 4 come real It is existing), it is dispatched by the instruction of the interior sequences and establishes assembly line inside each instruction sequence, in each neural network model Operation phase, which carries out the instruction between instruction sequence by the above-mentioned example method of the embodiment of the present application, dispatches (for example, can lead to The second processor of system shown in Figure 4 is crossed to realize), by instruction between instruction sequence scheduling can instruction sequence it Between establish assembly line.In this way, the assembly line inside instruction sequence is combined with the assembly line between instruction sequence, maximizing Performance hardware resource flowing water executive capability, thus the multiple minds of raising under the premise of not increasing hardware resource by a larger margin Overall execution efficiency through network model.

The above-mentioned example method of the embodiment of the present application is applicable to a variety of feelings for needing to run multiple neural network models Condition, and the whole of this multiple neural network model can be improved under the premise of not increasing hardware resource and ensuring that operation result is correct Body execution efficiency.

According to statistics, the neural network model of (for example, two layers neural network) relatively simple for a structure, load The 30%-50% that the total operation duration of the neural network will be accounted for the time required to data and load parameter, then more for structure For simple multiple neural network models, by the above method of the embodiment of the present application, for this multiple neural network model The runing time that can generally save general 40% at its best, about 20% can be saved by averagely getting off Runing time, correspondingly, the overall execution efficiency of multiple neural network models can have the raising of at least 20%-40% amplitude.

One more complicated neural network of structure (for example, 1,000 layers neural network structure) is added according to statistics It carries data and the 3%-5% of the total operation duration of the neural network will be accounted for the time required to loading parameter.So more for structure For complicated multiple neural network models, by the above method of the embodiment of the present application, for this multiple neural network model Generally, general 3% runing time can be saved, correspondingly, the overall execution efficiency of multiple neural network models can There is the raising of at least 3% amplitude.

Exemplary means

Figure 10 is the exemplary means 10 of the instruction scheduling for the neural network model that one exemplary embodiment of the application provides. As shown in Figure 10, the exemplary means 10 of the instruction scheduling of neural network model include:

Instruction sequence determination unit 101 is configured in the first sequence of instructions for needing to run corresponding first nerves network model When column, the second instruction sequence of the determination correspondence nervus opticus network model to be run, the first nerves network model prior to The nervus opticus network model operation；

Selecting unit 102 is instructed, is configured to select at least one in second instruction sequence to instruct；

Instruction insertion unit 103 is configured at least one instruction being inserted into first instruction sequence；And

Running unit 104 is instructed, first instruction sequence of the operation comprising at least one instruction is configured to.

If Figure 11 is the exemplary means of the instruction scheduling for the neural network model that one exemplary embodiment of the application provides 11.In a kind of implementation of the embodiment of the present application, instruction selecting unit 102 may include: spider module 1021 and selection mould Block 1022, wherein traversal model 1021 is configured to traverse second instruction sequence to search in second instruction sequence The first computations；Preference pattern 1022 is configured to select in second instruction sequence before the first computations Instruction.In this way, targetedly the instruction of nervus opticus network model incipient stage can be adjusted as between instruction sequence The object of degree, enable the incipient stage of nervus opticus network model and first nerves network model part stage (for example, Intermediate stage) parallel, to save the execution time of nervus opticus network model incipient stage, avoid in nervus opticus network mould The incipient stage hardware resource (for example, BPU) of type falls into a long wait, and then improves under the premise of not increasing hardware resource multiple The overall execution efficiency of neural network model.

In a kind of implementation of the embodiment of the present application, instruction insertion unit 103 is arranged to the number according to instruction operation At least one instruction is inserted into first instruction sequence one by one according to descending order.In this way, can preferentially by The long instruction of runing time is inserted into the first instruction sequence in nervus opticus network model, so that transporting in the second instruction sequence The row time, long instructing can more be run with one or more parallel instructions in the first instruction sequence to save Time (for example, time of nervus opticus network model incipient stage load data), and then more improve this multiple nerve net The entire execution efficiency of network model.

In a kind of implementation of the embodiment of the present application, at least one instruction is for loading the nervus opticus net The load of data needed for network model calculation instructs.Here, which can include but is not limited to characteristic, weight parameter etc.. In this implementation, instruction insertion unit 103 is arranged at least one instruction being inserted into first sequence of instructions Between computations in column.The first mind is advanceed in this way, the load of nervus opticus network model incipient stage can be instructed In operational process through network model, the load time of nervus opticus network model incipient stage can be saved, reduces and starts rank The waiting time of section BPU, to improve execution efficiency.

As shown in figure 11, in a kind of implementation of the embodiment of the present application, instruction insertion unit 103 may include: first Correction verification module 1031 and/or the second correction verification module 1032；Wherein, the first correction verification module 1031 is configured to verify described at least one It whether there is data collision between each item instruction in instruction and first instruction sequence, with determination at least one instruction Legal position in first instruction sequence, so that at least one instruction is inserted into first instruction sequence In；Each item that second correction verification module 1032 is configured to verify at least one instruction and first instruction sequence instructs it Between whether there is hardware resource conflict, with the determination legal position of at least one instruction in first instruction sequence, So that at least one instruction is inserted into first instruction sequence.In this way, at least one instruction is inserted into the Before one instruction sequence, legal position of at least one instruction in the first instruction sequence is determined by verifying, can effectively be kept away The generation for exempting from conflict, establishes more reasonable assembly line between the instruction sequence of multiple neural network models, avoids these nerves Mistake occurs in network model operation or calculating, thus before ensuring that this multiple neural network model integral operation result is correct Put the overall execution efficiency for improving this multiple neural network model.

Here, the first correction verification module 1031 and/or the second correction verification module 1302 are arranged to from first instruction sequence End start for verification described in instruction execution one by one.

In the embodiment of the present application, the first correction verification module 1031 is arranged to: judging what at least one instruction was accessed Whether memory address and the memory address that the instruction of the first instruction sequence is accessed be overlapped and whether at least one instruction Write operation will be executed to the data in the memory address；In memory address overlapping and at least one instruction and/or institute When stating the instruction of the first instruction sequence for write operation is executed to the data in the memory address, determine at least one instruction with There are data collisions for this instruction of first instruction sequence.

In the embodiment of the present application, the second correction verification module 1302 is arranged to inquire preconfigured each item instruction and its institute Using the mapping relation information between hardware resource, at least one instruction and first is judged based on the mapping relation information Whether the instruction of instruction sequence needs using same hardware resource；Described at least one is determined when needing using same hardware resource There are hardware resource conflicts for item instruction and the instruction of first instruction sequence.

As shown in figure 11, in some implementations of the embodiment of the present application, instruction insertion unit 103 can also include: the One determining module 1033, is configured to that there are two or more is legal in first instruction sequence at least one instruction When position, determine that one of legal position is the insertion position of at least one instruction；Insertion operation module 1034, configuration For described at least one is instructed at the insertion position being inserted into the first instruction sequence.A kind of reality of the embodiment of the present application In existing mode, the first determining module 1033 can be arranged at least one instruction in first instruction sequence There are when two or more legal positions, determine that the longest legal position of parallel time is the insertion position of at least one instruction It sets；In another implementation of the embodiment of the present application, the first determining module 1033 can be arranged to described at least one Instruct in first instruction sequence there are when two or more legal positions, determine parallel time longest and first instruction Near the insertion position that preceding legal position instructs for described at least one in sequence.In this way, fortune can be saved as much as possible The row time, to more greatly improve the overall execution efficiency of multiple neural network models.

In one example, instruction insertion unit 103 can also include: the second determining module 1035, be configured to determine first One or more instruction in instruction sequence at each legal position at least one parallel instructions；Estimation block 1036, it is configured to each described in the magnitude estimation of the operation data for one or more instruction that second determining module determines The parallel time of a legal position.First determining module 1033 is arranged to the parallel time obtained using appraising model and determined The insertion position of at least one instruction.Here, the first determining module determines at least one instruction using parallel time Insertion position mode it is as above two, repeat no more.

As shown in figure 11, in some implementations of the embodiment of the present application, instruction insertion unit 103 can also include: to tear open Sub-module 1037 is configured to split when legal position is not present in first instruction sequence at least one instruction At least one instruction is two groups of instructions；Insertion operation module 1034 is additionally configured to split the fractionation module 1037 and obtain Two groups of instructions in there is no one group of instructions of data collision to be inserted into described the with the instruction in first instruction sequence In one instruction sequence.In this way, split the second instruction sequence in instruct by way of by runing time in the second instruction sequence compared with Long instruction is inserted into the first instruction sequence, is conducive to save more runing times, under the premise of not increasing hardware resource The overall execution efficiency of the multiple neural network models of raising by a larger margin.

As shown in figure 11, in a kind of implementation of the embodiment of the present application, instruction insertion unit 103 can also include: the Three determining modules 1038 determine that described at least one instructs and there is the number to conflict between the instruction in first instruction sequence According to there is no conflict data；And module 1037 is split, it is arranged to according to the data that there is conflict and is not present At least one instruction is split as two groups of instructions by the data of conflict, and first group of instruction in two groups of instructions is for grasping Make the data that there is conflict, second group of instruction in two groups of instructions is used to operate the data that conflict is not present.

The above-mentioned example device of the embodiment of the present application is applicable to a variety of fields for needing to run multiple neural network models Scape, and the whole of this multiple neural network model can be improved under the premise of not increasing hardware resource and ensuring that operation result is correct Body execution efficiency.It should be noted that above-mentioned example device 10 and exemplary means 11 are example, in the embodiment of the present application The specific structure two ways without being limited thereto of the instruction dispatching device of neural network model.

In practical application, the above-mentioned example device 10 and exemplary means 11 of the embodiment of the present application can pass through " example Running equipment in property system " is realized.In a kind of implementation, in exemplary means 10 and exemplary means 11, sequence of instructions Column determination unit 101, instruction selecting unit 102, instruction insertion unit 103 by the second processor in system shown in Figure 4 come It realizes, instruction running unit 104 can be realized by the third processor in system shown in Figure 4.

Example electronic device

In addition to the method described above, embodiments herein can also be electronic equipment, the electronic equipment include: one or Multiple processors；And memory, it is stored with computer instruction, the computer instruction makes when being run by the processor The processor executes the nerve described in above-mentioned " illustrative methods " part of this specification according to the various embodiments of the application Step in the instruction dispatching method of network model.

The above-mentioned electronic equipment of the embodiment of the present application is applicable to a variety of scenes for needing to run multiple neural network models, And the entirety of this multiple neural network model can be improved under the premise of not increasing hardware resource and ensuring that operation result is correct Execution efficiency.In practical application, the above-mentioned electronic equipment of the embodiment of the present application can pass through the running equipment in " exemplary system " To realize.It may include the second processor and third processing in system shown in Figure 4 in the electronic equipment in a kind of implementation Device.

Figure 12 illustrates the block diagram of the electronic equipment according to the embodiment of the present application.

As shown in figure 12, electronic equipment 12 includes one or more processors 121 and memory 122.

Processor 121 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability Other forms processing unit, and can control the other assemblies in electronic equipment 12 to execute desired function.

Memory 122 may include one or more computer program products, and the computer program product may include Various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.The volatibility is deposited Reservoir for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile Memory for example may include read-only memory (ROM), hard disk, flash memory etc..It can be on the computer readable storage medium One or more computer program instructions are stored, processor 121 can run described program instruction, to realize sheet described above The instruction dispatching method and/or other desired functions of the neural network model of each embodiment of application.

In one example, electronic equipment 12 can also include: input unit 123 and output device 124, these components are logical Cross bindiny mechanism's (not shown) interconnection of bus system and/or other forms.For example, the input unit 123 can be microphone Or microphone array.In addition, the input equipment 123 can also include such as keyboard, mouse etc..The output device 124 can be with It is output to the outside various information.The output equipment 124 may include such as display, loudspeaker, printer and communication network And its remote output devices connected etc..

Certainly, to put it more simply, illustrated only in Figure 12 it is some in component related with the application in the electronic equipment 12, The component of such as bus, input/output interface etc. is omitted.In addition to this, according to concrete application situation, electronic equipment 12 is also It may include any other component appropriate.

Illustrative computer program product and computer readable storage medium

Other than the above method and equipment, embodiments herein can also be computer program product comprising meter Calculation machine program instruction, it is above-mentioned that the computer program instructions make the processor execute this specification when being run by processor According in the instruction dispatching method of the neural network model of the various embodiments of the application described in " illustrative methods " part Step.

The computer program product can be write with any combination of one or more programming languages for holding The program code of row the embodiment of the present application operation, described program design language includes object oriented program language, such as Java, C++ etc. further include conventional procedural programming language, such as " C " language or similar programming language.Journey Sequence code can be executed fully on the user computing device, partly execute on a user device, be independent soft as one Part packet executes, part executes on a remote computing or completely in remote computing device on the user computing device for part Or it is executed on server.

In addition, embodiments herein can also be computer readable storage medium, it is stored thereon with computer program and refers to It enables, the computer program instructions make the processor execute above-mentioned " the exemplary side of this specification when being run by processor According to the step in the instruction dispatching method of the neural network model of the various embodiments of the application described in method " part.

The computer readable storage medium can be using any combination of one or more readable mediums.Readable medium can To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity Magnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Readable storage medium storing program for executing it is more specific Example (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory Device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

The basic principle of the application is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that in this application The advantages of referring to, advantage, effect etc. are only exemplary rather than limitation, must not believe that these advantages, advantage, effect etc. are the application Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work being easy to understand With, rather than limit, it is that must be realized using above-mentioned concrete details that above-mentioned details, which is not intended to limit the application,.

Device involved in the application, device, equipment, system block diagram only as illustrative example and be not intended to It is required that or hint must be attached in such a way that box illustrates, arrange, configure.As those skilled in the art will appreciate that , it can be connected by any way, arrange, configure these devices, device, equipment, system.Such as "include", "comprise", " tool " etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above "or" and "and" refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.

It may also be noted that each component or each step are can to decompose in the device of the application, device and method And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the application.

The above description of disclosed aspect is provided so that any person skilled in the art can make or use this Application.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein General Principle can be applied to other aspect without departing from scope of the present application.Therefore, the application is not intended to be limited to Aspect shown in this, but according to principle disclosed herein and the consistent widest range of novel feature.

In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the application It applies example and is restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this field skill Its certain modifications, modification, change, addition and sub-portfolio will be recognized in art personnel.

Claims

1. a kind of instruction dispatching method of neural network model, comprising:

When needing to run the first instruction sequence of corresponding first nerves network model, the determination correspondence nervus opticus net to be run Second instruction sequence of network model, the first nerves network model are run prior to the nervus opticus network model；

At least one in second instruction sequence is selected to instruct；

At least one instruction is inserted into first instruction sequence；And

2. instruction dispatching method according to claim 1, wherein at least one in selection second instruction sequence refers to It enables, comprising:

Second instruction sequence is traversed to search the first computations in second instruction sequence；And

Select the instruction before the first computations in second instruction sequence.

3. instruction dispatching method according to claim 1, wherein at least one instruction is inserted into described first and is referred to It enables in sequence, comprising:

At least one instruction is inserted into first instruction one by one according to the descending order of the data of instruction operation In sequence.

4. instruction dispatching method according to claim 1, wherein at least one instruction is for loading described second The load of data needed for neural network model operation instructs.

5. instruction dispatching method according to claim 1, wherein at least one instruction is inserted into described first and is referred to It enables in sequence, comprising: between the computations that at least one instruction is inserted into first instruction sequence.

6. instruction dispatching method according to claim 1, wherein at least one instruction is inserted into described first and is referred to It enables in sequence, comprising:

Verify it is described at least one instruction and first instruction sequence in each item instruction between with the presence or absence of data collision and/ Or hardware resource conflict, with the determination legal position of at least one instruction in first instruction sequence, so as to by institute At least one instruction is stated to be inserted into first instruction sequence.

7. instruction dispatching method according to claim 6, wherein

If there are two or more legal positions in first instruction sequence at least one instruction, when determining parallel Between longest legal position be it is described at least one instruction insertion position.

8. instruction dispatching method according to claim 6, wherein if at least one instruction is in first instruction There are two or more legal positions in sequence, determine parallel time longest and in the first instruction sequence near preceding legal position It is set to the insertion position of at least one instruction.

9. instruction dispatching method according to claim 6, wherein

If legal position is not present at least one instruction in first instruction sequence, splits described at least one and refer to Enabling is two groups of instructions, and one group with the instruction in first instruction sequence there is no data collision in two groups of instructions is referred to Order is inserted into first instruction sequence.

10. instruction dispatching method according to claim 9, wherein fractionation at least one instruction refers to for two groups It enables, comprising:

Determine that described at least one instructs and there are the data to conflict between the instruction in first instruction sequence and be not present The data of conflict；And

According to the data that there is conflict and there is no the data of conflict, and at least one instruction is split as two groups and is referred to It enables, the data that first group of instruction in two groups of instructions conflicts for operating the presence, second in two groups of instructions Group instruction is for operating the data that conflict is not present.

11. instruction dispatching method according to claim 7 or 8, wherein determine the longest legal position of parallel time, wrap It includes:

Determine one or more instruction in the first instruction sequence at each legal position at least one parallel instructions；

The parallel time of each legal position described in magnitude estimation based on the one or more instruction operands evidence；

The parallel time of each legal position based on estimation, determines the longest legal position of parallel time.

12. instruction dispatching method according to claim 6, wherein be directed to since the end of first instruction sequence It is verified described in instruction execution one by one.

13. instruction dispatching method according to claim 6, wherein verification at least one instruction and the first sequence of instructions It whether there is data collision between each item instruction in column, comprising:

Judge the memory address that the instruction of at least one instruction accessed memory address and the first instruction sequence is accessed Whether be overlapped and it is described at least one instruction and/or first instruction sequence in instruction whether will be to the memory address In data execute write operation；

It is overlapped in the memory address and at least one instruction and/or the instruction of first instruction sequence will be to the memory When data in address execute write operation, determine that there are data at least one instruction and this instruction of the first instruction sequence Conflict.

14. instruction dispatching method according to claim 6, wherein verification at least one instruction and the first sequence of instructions It whether there is hardware resource conflict between each item instruction in column, comprising:

The mapping relation information between preconfigured each item instruction and its used hardware resource is inquired, is closed based on the mapping It is that information judges whether at least one instruction and the instruction of the first instruction sequence need using same hardware resource；

Determine that at least one instruction and the instruction of first instruction sequence exist when needing using same hardware resource Hardware resource conflict.

15. instruction dispatching method according to claim 1, wherein

The step of second instruction sequence of the determination correspondence nervus opticus network model to be run, selection second instruction The step of at least one instruction in sequence and by the step that is inserted into first instruction sequence of at least one instruction Suddenly it is executed by second processor；

First instruction sequence and the second instruction sequence are compiled by first processor, and the first processor and described second Processor is different types of processor.

16. a kind of electronic equipment, comprising:

One or more processors；And

Memory, is stored with computer instruction, and the computer instruction holds the processor when being run by the processor Row is according to claim 1 to method described in any one of 15.

17. a kind of instruction dispatching device of neural network model, comprising:

Instruction sequence determination unit is configured to when needing to run the first instruction sequence of corresponding first nerves network model, really Second instruction sequence of the correspondence nervus opticus network model to be run surely, the first nerves network model is prior to described second Neural network model operation；

18. a kind of computer readable storage medium is stored thereon with computer program instructions, the computer program instructions are in quilt Processor makes the processor execute the method as described in any one of claims 1 to 15 when running.