WO2022037490A1 - 神经网络的运算方法、装置、计算机设备及存储介质 - Google Patents

神经网络的运算方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022037490A1
WO2022037490A1 PCT/CN2021/112471 CN2021112471W WO2022037490A1 WO 2022037490 A1 WO2022037490 A1 WO 2022037490A1 CN 2021112471 W CN2021112471 W CN 2021112471W WO 2022037490 A1 WO2022037490 A1 WO 2022037490A1
Authority
WO
WIPO (PCT)
Prior art keywords
target layer
layer
group
computing
core
Prior art date
Application number
PCT/CN2021/112471
Other languages
English (en)
French (fr)
Inventor
何伟
沈杨书
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022037490A1 publication Critical patent/WO2022037490A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • Embodiments of the present invention relate to computer technologies, in particular to the field of neural networks and AI technologies, and in particular, to a neural network computing method, apparatus, computer equipment, and storage medium.
  • the neural network in order to improve the operation speed of the neural network, can be loaded into a physical chip, and the operation functions of each layer of the neural network can be realized by the operation core on the physical chip.
  • the weight data of each layer in the neural network can be loaded into the corresponding computing core of the physical chip at one time for computation.
  • the data volume of the weight data of the neural network is greater than the storage capacity of the physical chip (the storage capacity of each computing core)
  • the one-time loading of the weight data cannot be realized.
  • the inventor found that the method of the related art takes a long time in the whole operation process and has low operation efficiency.
  • Embodiments of the present invention provide a computing method, device, computer equipment and storage medium for a neural network, so as to improve computing efficiency in a foldable group operation scenario.
  • an embodiment of the present invention also provides a method for computing a neural network, where the neural network includes multiple folding groups, each folding group includes one or more consecutive layers, each layer corresponds to at least one computing core, and the same
  • the operation cores corresponding to different layers in the folding group are different, and the operation cores corresponding to different folding groups are at least partially the same
  • the method includes: when it is determined that the target layer of the N+1th folding group satisfies the ready condition , the target layer of the N+1 th folded group and the partial layers of the N th folded group are respectively processed in parallel in their corresponding operation cores.
  • an embodiment of the present invention further provides a computing device for a neural network, where the neural network includes multiple folding groups, each folding group includes one or more consecutive layers, and each layer corresponds to at least one computing core, The computing cores corresponding to different layers in the same folding group are different, and the computing cores corresponding to different folding groups are at least partially the same.
  • the parallel processing module is used to compare the target layer of the N+1 fold group with the target layer of the N th fold group when it is determined that the target layer of the N+1 th fold group meets the ready condition. Part of the layers are processed in parallel in their respective computing cores.
  • an embodiment of the present invention further provides a computer device, the computer device includes: one or more processors; a storage device for storing one or more programs, when the one or more programs are The one or more processors execute, so that the one or more processors implement the operation method of the neural network according to any embodiment of the present invention.
  • an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the computing method of a neural network according to any embodiment of the present invention .
  • the technical solution of the embodiment of the present invention introduces a new parallel mechanism between folded groups in the folded group operation scenario of the neural network.
  • the operation of the latter folded group is not the start condition of the entire operation of the previous folded group, but During the operation of the previous folding group, when it is detected that a set layer (target layer) in the latter folding group satisfies the ready condition, the operation of the layer in the latter folding group can be started.
  • the technical solutions of the embodiments of the present invention provide a new operation mechanism between folding groups, which reduces the computation time-consuming of the folding group neural network and improves the computing efficiency of the folding group neural network.
  • 1a is a schematic diagram of dividing a neural network into multiple folding groups in the related art
  • FIG. 1b is a schematic structural diagram of computing multiple folding groups in the form of a pipeline in the related art
  • Fig. 1c is a kind of sequence diagram of calculating a plurality of folding groups in the form of pipeline in the related art
  • Fig. 1d is the realization flow chart of the operation method of a kind of neural network in the embodiment of the present invention.
  • 1e is a sequence diagram of a parallel operation of multiple folding groups to which an embodiment of the present invention is applicable;
  • Fig. 2a is the realization flow chart of another kind of operation method of neural network in the embodiment of the present invention.
  • 2b is a sequence diagram of a parallel operation of multiple folding groups to which an embodiment of the present invention is applicable;
  • Fig. 3 is the realization flow chart of another kind of operation method of neural network in the embodiment of the present invention.
  • FIG. 4 is a structural diagram of a computing device of a neural network according to an embodiment of the present invention.
  • FIG. 5 is a structural diagram of a computer device in an embodiment of the present invention.
  • FIG. 1c the operation sequence of each operation core corresponding to the neural network shown in FIG. 1a and FIG. 1b in some related technologies is briefly described.
  • the neural network can be folded.
  • the neural network including 7 layers can be divided into two folding groups.
  • Folding group 1 includes the first layer to the fourth layer.
  • Folding group two includes the fifth to seventh layers.
  • the computing core in the physical chip first loads the weight data required for folding group 1 to perform the operation of folding group 1, and temporarily stores the intermediate data in the memory after all operations are completed. , after continuing to load the weight data required for the second folding group, use the intermediate data in the memory to perform the operation of the second folding group.
  • the neural network includes a total of seven layers, wherein the first to fourth layers are divided into folding group 1, and the fifth to seventh layers are divided into folding group 2.
  • different layers are run by different computing cores (abbreviated as cores in the block in Figure 1b).
  • the abscissa corresponds to different time periods (also called time slices), each time period represents the time required for a single operation of an operation core (obtaining the operation result data according to the input data), and the ordinate corresponds to different computing cores.
  • Fig. 1c shows the operation process for 5 items of input data (data 1, data 2, data 3, data 4 and data 5), different data (and processing results of data in layers) use rectangles with different numbers Block representation.
  • the data 1 are respectively input to the operation core 1 to the operation core 3, and the first-layer operation is performed by the above three operation cores; in the T2 time period, the data 2 is respectively input to the operation core 1 to the operation core. 3.
  • the data (operation result data) generated after the first-layer operation of the data 1 is completed in the T1 time period is transmitted to the operation core 4 for the second-layer operation.
  • the data 3 is respectively input to the operation core 1 to the operation core 3 for the operation of the first layer, and the data obtained after the operation of the data 2 in the first layer in the T2 time period is completed.
  • the operation of the second layer and the data obtained after the operation of the data 1 in the second layer in the T2 time period is completed are transmitted to the operation core 5 - the operation core 8 for the operation of the third layer, and so on. Only in the T8 time period, the operation core 9 and the operation core 10 finally complete the operation on the fourth layer of the last data (data 5), that is, the first folded group is processed.
  • the operation of the second folding group can only be started in the T9 time period (that is, after the processing of the first folding group is completed), that is, the operation core 1 to the operation core 4 start to perform the operation of the data 1 according to the operation core 9 and the operation core 10.
  • the data (operation result data) generated by the fourth-level operation is performed on the fifth-level operation of data 1 (that is, the operation of the second folding group is started), and so on, until the processing of data 1-data 5 in folding group 2 is completed. (Not all shown in Figure 1c).
  • the inventor found that the above pipelined operation is used between multiple folding groups, and the operation of the next folding group can only be performed after the operation of the previous folding group is completed, and the whole operation process of the above method takes a long time. Operational efficiency is low. Specifically: In fact, for the operation of the fifth layer in the second folding group on data 1, the required operation data is ready after the T4 time period (T5 time period) (the operation core 9 and the operation core 10), the required operation cores (operation core 1-operation core 4) are ready after T6 time period (T7 time period). Therefore, on the premise that there is no conflict of data and resources, the fifth layer (the fifth layer of data 1 ) should be able to start executing in the T7 time period at the earliest, without waiting for the entire processing of the folded group 1 to be completed.
  • the inventor creatively proposes a parallel operation method between folding groups.
  • the operation of the previous folding group as long as a certain layer in the latter folding group satisfies the ready condition, the operation of this layer can be started. , without waiting for the entire operation of the previous folded group to end, so as to greatly improve the computational efficiency of the folded group neural network.
  • 1d is a flow chart of a method for computing a neural network provided by an embodiment of the present invention.
  • This embodiment is applicable to computing a neural network including at least two folding groups in the same many-core system (eg, one chip) (
  • the method can be executed by a computing device of the neural network, which can be implemented in software and/or hardware, and can generally be concentrated in a many-core system for running the neural network, the method Specifically, when it is determined that the target layer of the N+1 th folded group satisfies the ready condition, the target layer of the N+1 th folded group and the partial layers of the N th folded group are respectively processed in parallel in their corresponding computing cores.
  • the method may include the following S110 to S120.
  • S110 Determine whether the target layer in the N+1 th fold group of the neural network satisfies the ready condition: if yes, execute S120, otherwise, return to continue to execute S110.
  • N is a positive integer greater than or equal to 1
  • the value range of N is N ⁇ [1, M]
  • M+1 is the total number of fold groups included in the neural network.
  • the neural network to which the embodiments of the present invention are applicable includes multiple folding groups, each folding group includes one or more consecutive layers, each layer corresponds to at least one computing core, and the computing core is specifically the hardware in the many-core system
  • the operation unit is used to perform operation processing on the corresponding layer in the form of hardware.
  • the operation cores corresponding to different layers in the same folding group are different, and the operation cores corresponding to different folding groups are at least partially the same. That is, different layers in the same folding group need to correspond to different computing cores, while at least some of the layers in different folding groups (especially adjacent folding groups) have the same computing cores, or have at least one computing core. overlapping.
  • all the hardware computing units may be allocated to all the folding groups respectively. For example, if a many-core system includes computing core 1 - computing core 10, and the neural network running on the many-core system includes folding group 1 and folding group 2, at this time, computing core 1 - computing core 10 can be allocated respectively at the same time. Give fold group one and fold group two, so that both fold groups use the most computing hardware for operation.
  • operation core 1 - operation core 10 are allocated to folding group 1
  • operation core 1 - operation core 8 are allocated to folding group 2.
  • the computing power of each computing core and the computing power required by each layer can be preset or known, and by ensuring that each layer obtains at least the required computing power, the corresponding computing core; or, the identification information (for example, the computing core number) of each computing core can be preset, and the identification information corresponding to each layer can be specified by means of pre-compilation, and then the corresponding computing operation can be assigned to each layer. nuclear.
  • the solution of the embodiment of the present invention provides a parallel operation mode of the folding group, that is, the operation of the subsequent folding group is not triggered by the completion of the entire operation of the previous folding group, but the operation of the folding group satisfies the readiness.
  • the condition is the trigger operation timing.
  • the target layer of the N+1 th fold group specifically refers to a layer next to the last layer in each layer that is currently in an operation state of the neural network. That is, the nearest layer that needs to trigger an operation.
  • the target layer may be the first layer in the N+1th folding group, or may be any layer after the first layer in the N+1th folding group.
  • the layers currently in operation state of the neural network are the third, fourth and fifth layers.
  • the last layer is the fifth layer
  • the target layer is the sixth layer next to the fifth layer. That is, in the technical solution of the embodiment of the present invention, in the process that a current folding group is in the operation state, it is necessary to judge in real time whether the layer after the layer currently in the last running state is located in the next folding group (that is, whether it is the target layer). If yes, it is also necessary to judge whether the target layer satisfies the ready condition, and then perform a trigger operation on the target layer according to the ready condition.
  • determining that the target layer of the N+1th folded group satisfies the ready condition may include: if determining that the operation data required by the target layer is ready and the operation core corresponding to the target layer is ready, then determining that the target layer satisfies the ready condition. That is, when both the operation data required by the target layer and the required operation cores are ready, the target layer satisfies the ready condition, and the target layer can start to perform operations.
  • determining that the operation data required by the target layer is ready may include: when it is determined that the pre-order layer has output operation result data for the current input data, determining the operation data required by the target layer The data is ready, where the pre-order layer is the previous layer connected to the target layer.
  • the operation data required by the fifth layer is the output of the fourth layer.
  • the operation result that is, every time the fourth layer operates on the current input data and outputs the corresponding operation result data, the fifth layer can start to perform the data operation of this layer based on the operation result data.
  • the pre-sequence layer may be in the N+1 th folded group, or may be in the Nth folded group.
  • the pre-order layer is the last layer in the Nth folding group; when the target layer is the first layer in the N+1th folding group except for the first layer other layers, the pre-order layer is the previous layer of the other layer in the N+1th folding group.
  • determining that the computing core corresponding to the target layer is ready may include:
  • the computing cores corresponding to the target layer are ready, and that the computing cores corresponding to the target layer are all currently idle computing cores.
  • the computing capability corresponding to each computing core in the many-core system may be preset or known, that is, the computing amount that can be provided in a single computing process.
  • the computing power required by the target layer specifically refers to the amount of computation that needs to be realized when the target layer performs operations on the current input data.
  • this embodiment may not explicitly specify which computing core or cores are specifically allocated to the target layer in advance, but only compare whether the total computing power of the currently idle computing cores (the cumulative sum of the computing capabilities of the currently idle computing cores) is Matches the computing power required by the target layer (“matching” includes the total computing power of the currently idle computing core is equal to or exceeds the computing power required by the target layer), if so, the above idle computing core can be allocated to the target layer for processing.
  • Computing processing that is, selecting all or part of the idle computing cores as the computing cores corresponding to the target layer, in which it is obviously necessary to ensure that the total computing power of the idle computing cores allocated to the target layer matches the computing power required by the target layer.
  • determining that the computing core corresponding to the target layer is ready may further include: if it is determined that the currently idle computing core matches the computing core corresponding to the target layer, then determining that the computing core corresponding to the target layer is ready. The computing core corresponding to the layer is ready.
  • each layer in the neural network and each computing core in the many-core system can be pre-set by pre-compiling (for example, determining the computing core corresponding to each layer) number), and then when the computing cores of each number corresponding to the target layer are in an idle state (of course, other computing cores that do not correspond to the target layer are in an idle state), the computing core preparation corresponding to the target layer can be determined. ready.
  • S120 Perform parallel processing on the target layer of the N+1 th folded group and the partial layers of the N th folded group in the corresponding computing cores respectively.
  • the target layer of the N+1 th fold group is ready is used as the operation start condition of the target layer. Therefore, when ready, during the operation of the N th fold group, the The processing of the N+1 fold group, that is, the parallel processing of the N+1 th fold group and the N th fold group is realized, wherein at least the target layer in the N+1 th fold group is processed in parallel with the partial layers of the N th fold group , of course, other layers in the N+1 fold group and the N th fold group are also processed in parallel (such as the "target layer" in the previous time period).
  • FIG. 1e shows a sequence diagram of a parallel operation of multiple folding groups to which the embodiment of the present invention is applied.
  • the fifth layer in the second folding group is the target layer, and The fifth layer can not only obtain the operation result data output by the fourth layer in the folding group 1 for the data 1 (available in the T5 time period), at the same time, the operation core 1-operation core 4 corresponding to the fifth layer has also completed the folding group.
  • the processing of data 1 to data 5 (the processing of the first layer and the second layer) is in an idle state (T7 time period is available).
  • the operation of the fifth layer can be triggered to execute, that is, in the T7 time period, the operation core 1 - the operation core 4 perform the operation of the layer (the fifth layer) in the folding group 2, and the operation core 5-
  • the operation core 10 performs operations on the layers (the third layer and the fourth layer) in the folding group 1, that is, the parallel processing of the folding group 1 and the folding group 2 is realized.
  • the sixth layer in the folded group 2 is the target layer, which can not only obtain the operation result data output by the fifth layer in the folded group 2 for data 1, and at the same time, the operation corresponding to the sixth layer can be obtained.
  • Core 5-computing core 6 has also completed the processing of data 1 to 5 in folding group 1 (the processing of the third layer), and is in an idle state.
  • operation core 1-operation core 6 perform operations on the layers (fifth and sixth layers) in folding group two
  • operation core 9-operation core 10 perform operations on layers (fourth layer) in folding group one , that is, the parallel processing of the folding group 1 and the folding group 2 is realized (at this time, the operation core 7 and the operation core 8 are idle).
  • the parallel processing operations between the above-mentioned folding groups can save at least two time periods for the case of 5 input data.
  • the more computing cores included in each folding group the greater the number of time periods that can be saved by the above solution, and the more obvious the increase in the computing efficiency of the neural network.
  • the technical solution of the embodiment of the present invention introduces a new parallel mechanism between folded groups in the folded group operation scenario of the neural network.
  • the operation of the latter folded group is not the start condition of the entire operation of the previous folded group, but During the operation of the previous folding group, when it is detected that a set layer (target layer) in the latter folding group satisfies the ready condition, the operation of the layer in the latter folding group can be started.
  • the technical solution of the example provides a new operation mechanism between folding groups, reduces the computation time of the folding group neural network, and improves the computing efficiency of the folding group neural network.
  • FIG. 2a is an implementation flowchart of another neural network operation method provided by an embodiment of the present invention.
  • This embodiment is refined based on the above-mentioned embodiment.
  • the N+1th folded group and the The operation of parallel processing of the N-folding group is refined as: performing the operation of loading weight data corresponding to the target layer to each computing core corresponding to the target layer, wherein each computing core corresponding to the target layer includes the completed operations in the Nth folding group.
  • the operation core corresponding to the operation layer; in the process of performing the operation through each operation core corresponding to the uncompleted operation layer in the Nth folding group, each operation core corresponding to the target layer in the N+1th folding group is triggered to perform parallel operation.
  • each operation core corresponding to the target layer in the N+1 fold group may further include: through each operation core corresponding to the target layer, according to the operation result data obtained from the memory or with The operation result data output by the pre-order layer connected to the target layer in real time is processed in parallel; through each operation core corresponding to the target layer, the operation result data obtained by the current operation is stored in the memory or output in real time to the post-operation connected to the target layer. sequence layer.
  • the method of this embodiment specifically includes the following S210 to S260.
  • S210 Determine whether the pre-order layer has output operation result data for the current input data: if so, execute S220; otherwise, return to execute S210.
  • the pre-order layer is the previous layer connected to the target layer. That is, the pre-order layer is the last layer in the layers currently in the operation state, and the layer may be in the Nth folding group, or may be in the N+1th folding group.
  • S220 Determine whether the number of the currently idle computing core matches the number of the computing core corresponding to the target layer: if yes, execute S230; otherwise, return to execute S220.
  • each operation core corresponding to the target layer at least includes the operation core corresponding to the completed operation layer in the Nth folding group, that is, at least a part of the operation core corresponding to the target layer in the N+1th folding group is originally processed in the Nth folding group.
  • the computing core corresponding to the layer of but it is already idle at this time.
  • each operation core corresponding to the target layer includes the operation core corresponding to the layer in the Nth folding group that has completed the operation, and further may all be the operation core corresponding to the layer in the Nth folding group that has completed the operation.
  • the weight data corresponding to each layer in the neural network can be determined in the pre-compilation stage, and stored in the memory in advance, and the weight data corresponding to each layer can be obtained by reading directly.
  • the target layer of the N+1 fold group needs to use the idle computing cores of the layers in the N th fold group that have completed operations to perform parallel operations together with one or more layers in the N th fold group that have not completed operations. Therefore, the operation of the target layer is triggered in the process of continuing the processing of the layer in the Nth folded group that has not completed the operation.
  • the operation data required by the target layer may be prepared before the corresponding operation core of the target layer.
  • the operation data required for the fifth layer of the folding group 2 can be used in the T5 time period, and the The operation core 1 - operation core 4 corresponding to the fifth layer of the folding group 2 are not ready until the T7 time period. Therefore, the operation result data obtained by the operation of the pre-order layer (that is, the operation data required by the current target layer) can be stored first.
  • the operation core 1 - the operation core 4 obtain the corresponding operation result data from the memory for operation;
  • the operation data required by the target layer may be prepared together with the corresponding operation cores.
  • the operation data required by the sixth layer of the folding group two is ready in the T8 time period, and the operation data required by the sixth layer of the folding group two
  • the corresponding computing core 5-computing core 6 are also ready in the T8 time period.
  • the pre-order layer (for example, the fifth layer) of the target layer can directly transmit the operation result data obtained by the operation to the target layer, and the target layer is sent by the target layer.
  • the layer directly performs corresponding operation processing in the new time period.
  • operation result data that is, the operation data required when the next subsequent layer is used as the target layer
  • the operation result data can be directly output to the post-sequence layer (that is, the next layer connected to the target layer) in real time; if it is determined that the operation core corresponding to the post-sequence layer is not ready, the operation result data obtained by the current operation can be It is stored in the memory, and the post-sequence layer is used as the target layer and when the ready condition is met, it is obtained from the memory in real time and the corresponding operation is performed.
  • the weight loading time can be allocated separately for the above operations, and after the readiness condition is satisfied, wait for the weight loading time to complete, and then the computing core executes the data operation in a new time period.
  • triggering each computing core corresponding to the target layer in the N+1 th folded group to perform parallel operations may include: satisfying the readiness from the target layer of the N+1 th folded group. The condition begins, and after the interval weight is loaded for a length of time, each computing core corresponding to the target layer in the N+1 fold group is triggered to perform parallel computing.
  • the weight loading duration can be a preset value, such as a time period; or, in order to minimize time waste, the weight recording duration can be determined according to the actual weight loading time, or That is, every time each computing core corresponding to the target layer completes the weight loading process, each computing core corresponding to the target layer in the N+1 th folded group is immediately triggered to perform parallel computing.
  • FIG. 2b is a sequence diagram of a parallel operation of multiple folding groups applicable to the embodiment of the present invention.
  • the operation data required by the fifth layer of the folding group 2 and the corresponding operation data
  • the computing cores are all ready, because you can wait for another time period (T7 time period) to complete the load waiting for the weight data of the fifth layer, and finally start the fifth layer operation in the T8 time period.
  • the target layer specifically obtains the operation data from the memory or Obtaining data directly from the pre-order layer further improves the application scenarios of the embodiments of the present invention, and utilizes various resources in the many-core system to the greatest extent.
  • the time period required for each computing core to operate is not occupied, so as to further ensure the computing accuracy and reliability of the neural network.
  • FIG. 3 is an implementation flowchart of another neural network operation method provided by an embodiment of the present invention. This embodiment is refined based on the above-mentioned embodiment.
  • the method further includes: in response to the operation start instruction, storing the weight data corresponding to each layer in the first folded group of the neural network respectively. Loading into each operation core corresponding to each layer of the first folding group; performing operation through each operation core corresponding to each layer of the first folding group.
  • the method of this embodiment specifically includes the following S310 to S330.
  • the first folding group of the neural network can be run once to load all the weight data required for this first fold group.
  • the neural network is a neural network that does not include a feedback loop.
  • the reason for this setting is that when a neural network does not include a feedback loop (that is, the output of the latter layer is returned as the input of the former layer), after a certain layer completes the complete operation of the input data, it will not start the operation again. Therefore, each computing core corresponding to this layer can be allocated to other layers for use, and there will be no conflict in the allocation of computing cores.
  • the neural network that does not include the feedback loop may be an ANN (Artificial Neural Network, artificial neural network) or an SNN (SNN-Spiking Neuron Networks, spiking neural network), etc., which is not limited in this embodiment.
  • ANN Artificial Neural Network, artificial neural network
  • SNN SNN-Spiking Neuron Networks, spiking neural network
  • the target layer of the N+1 th folded group satisfies the ready condition
  • the target layer of the N+1 th folded group and the partial layers of the N th folded group are processed in parallel in their corresponding computing cores respectively.
  • the technical solution of the embodiment of the present invention introduces a new parallel mechanism between folded groups in the folded group operation scenario of the neural network.
  • the operation of the latter folded group is not the start condition of the entire operation of the previous folded group, but During the operation of the previous folding group, when it is detected that a set layer (target layer) in the latter folding group satisfies the ready condition, the operation of the layer in the latter folding group can be started.
  • the technical solution of the example provides a new operation mechanism between folding groups, reduces the computation time of the folding group neural network, and improves the computing efficiency of the folding group neural network.
  • FIG. 4 is a structural diagram of a computing device for a neural network provided by an embodiment of the present invention.
  • the neural network includes multiple folding groups, each folding group includes one or more consecutive layers, each layer corresponds to at least one computing core, the computing cores corresponding to different layers in the same folding group are different, and the operations corresponding to different folding groups
  • the cores are at least partially identical.
  • the apparatus includes: a ready condition determination module 410 and a parallel processing module 420 .
  • the ready condition determination module 410 is used to judge whether the target layer in the N+1 th fold group satisfies the ready condition; the parallel processing module 420 is used to determine whether the target layer of the N+1 th fold group satisfies the ready condition, The target layer of the +1 fold group and the partial layers of the Nth fold group are processed in parallel in their corresponding computing cores respectively.
  • the technical solution of the embodiment of the present invention introduces a new parallel mechanism between fold groups in the fold group operation scenario of the neural network.
  • the operation of the latter fold group is not based on the completion of the entire operation of the previous fold group as the starting condition, but During the operation of the previous folding group, when it is detected that a set layer (target layer) in the latter folding group satisfies the ready condition, the operation of the layer in the latter folding group can be started.
  • the technical solution of the example provides a new operation mechanism between folded groups, reduces the computational time-consuming of the folded group neural network, and improves the computational efficiency of the folded group neural network.
  • the ready condition determination module 410 may include: an operation data judgment unit for judging whether the operation data required by the target layer is ready; an operation core judgment unit for judging the operation corresponding to the target layer Whether the core is ready to continue; the comprehensive determination unit is used to determine that the target layer satisfies the ready condition if it is determined that the operation data required by the target layer is ready and the operation core corresponding to the target layer is ready.
  • the operation data judgment unit can be specifically configured to: when it is determined that the pre-order layer has output operation result data for the current input data, it is determined that the operation data required by the target layer is ready, wherein the pre-order layer is ready.
  • a layer is the previous layer connected to the target layer.
  • the computing core judgment unit may be specifically configured to: if it is determined that the total computing power of the currently idle computing core matches the computing power required by the target layer, then determine that the computing core corresponding to the target layer is ready , and it is determined that the computing cores corresponding to the target layer are all currently idle computing cores; or, if it is determined that the currently idle computing cores match the computing cores corresponding to the target layer, it is determined that the computing cores corresponding to the target layer are ready.
  • the parallel processing module 420 includes: a weight loading unit, configured to perform an operation of loading weight data corresponding to the target layer to each computing core corresponding to the target layer, wherein the weight data corresponding to the target layer is
  • the operation core includes the operation core corresponding to the completed operation layer in the Nth folded group; the parallel operation trigger unit is used to trigger the Nth +
  • Each computing core corresponding to the target layer in the 1-fold group performs parallel computing.
  • the parallel operation triggering unit is specifically used for: starting from the target layer of the N+1 th fold group meeting the ready condition, after the interval weight is loaded, triggering the corresponding target layer in the N+1 th fold group
  • Each of the cores performs parallel operations.
  • the apparatus may further include a parallel processing module 420 and a target layer operation unit, configured to: after triggering each operation core corresponding to the target layer in the N+1 th folded group to perform parallel operation, Each operation core corresponding to the target layer performs parallel operation according to the operation result data obtained from the memory or the operation result data output by the pre-order layer connected to the target layer in real time; through each operation core corresponding to the target layer, the current operation The obtained operation result data is stored in the memory or output in real time to the post-sequence layer connected to the target layer.
  • a parallel processing module 420 and a target layer operation unit configured to: after triggering each operation core corresponding to the target layer in the N+1 th folded group to perform parallel operation, Each operation core corresponding to the target layer performs parallel operation according to the operation result data obtained from the memory or the operation result data output by the pre-order layer connected to the target layer in real time; through each operation core corresponding to the target layer, the current operation The obtained operation result data is stored in the memory or output in real time to
  • the apparatus may further include a first folding group operation module, configured to: in response to the operation start instruction, load the weight data corresponding to each layer in the first folding group of the neural network into and In each operation core corresponding to each layer of the first folding group; the operation is performed by each operation core corresponding to each layer of the first folding group.
  • a first folding group operation module configured to: in response to the operation start instruction, load the weight data corresponding to each layer in the first folding group of the neural network into and In each operation core corresponding to each layer of the first folding group; the operation is performed by each operation core corresponding to each layer of the first folding group.
  • the neural network may be a neural network that does not include a feedback loop.
  • the computing device of the neural network provided by the embodiment of the present invention can execute the computing method of the neural network provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
  • the computer device includes a processor 50, a storage device 51, and may also include an input device 52 and an output device 53; processing in the computer device
  • the number of the processor 50 can be one or more, and one processor 50 is taken as an example in FIG. 5; the processor 50, the storage device 51, the input device 52 and the output device 53 in the computer equipment can be connected by a bus or in other ways, as shown in FIG. In 5, the connection through the bus is taken as an example.
  • the storage device 51 can be used to store software programs, computer-executable programs, and modules (computer programs), such as modules corresponding to the multitasking parallel processing method in the embodiment of the present invention.
  • the processor 50 executes various functional applications and data processing of the computer equipment by running the computer program stored in the storage device 51 , that is, implements the operation method of the neural network according to any embodiment of the present invention.
  • the neural network includes multiple folding groups, each folding group includes one or more consecutive layers, each layer corresponds to at least one computing core, and different layers in the same folding group correspond to different computing cores, and different folding groups correspond to The computing cores are at least partially the same, including:
  • the target layer of the N+1 th folded group When it is determined that the target layer of the N+1 th folded group satisfies the ready condition, the target layer of the N+1 th folded group and the partial layers of the N th folded group are processed in parallel in their corresponding operation cores respectively.
  • the storage device 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like.
  • the storage device 51 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
  • storage device 51 may further include memory located remotely from processor 50, and these remote memories may be connected to the computer equipment through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 52 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the computer device.
  • the output device 53 may include a display device such as a display screen.
  • Embodiments of the present invention also provide a computer-readable storage medium containing computer-executable instructions (ie, a computer program), and the computer program, when executed by a processor, is used to execute the neural network computing method according to any embodiment of the present invention.
  • a computer-readable storage medium containing computer-executable instructions (ie, a computer program), and the computer program, when executed by a processor, is used to execute the neural network computing method according to any embodiment of the present invention.
  • the neural network includes multiple folding groups, each folding group includes one or more consecutive layers, each layer corresponds to at least one computing core, and different layers in the same folding group correspond to different computing cores, and different folding groups correspond to The computing cores are at least partially the same, including:
  • the target layer of the N+1 th folded group When it is determined that the target layer of the N+1 th folded group satisfies the ready condition, the target layer of the N+1 th folded group and the partial layers of the N th folded group are processed in parallel in their corresponding operation cores respectively.
  • a storage medium containing computer-executable instructions provided by an embodiment of the present invention, the computer-executable instructions of which are not limited to the above method operations, and can also perform related operations in the methods provided by any embodiment of the present invention.
  • the present invention can be realized by software and necessary general-purpose hardware, and of course can also be realized by hardware, but in many cases the former is a better embodiment .
  • the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer , server, or network device, etc.) to execute the methods of various embodiments of the present invention.
  • a computer-readable storage medium such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种神经网络的运算方法,所述神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同所述折叠组对应的运算核至少有部分相同,所述方法包括:当确定第N+1折叠组的目标层满足就绪条件时,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。提供了一种新的折叠组间的运行机制,减少了折叠组式神经网络的运算耗时,并提高了折叠组式神经网络的运算效率。

Description

神经网络的运算方法、装置、计算机设备及存储介质 技术领域
本发明实施例涉及计算机技术,具体涉及神经网络以及AI技术领域,尤其涉及一种神经网络的运算方法、装置、计算机设备及存储介质。
背景技术
目前,为了提高神经网络的运算速度,可以将神经网络加载至物理芯片中,由物理芯片上的运算核实现神经网络各层的运算功能。其中,可以将神经网络中各层的权重数据一次性加载至物理芯片的对应的运算核上进行运算。但是,当神经网络的权重数据的数据量大于物理芯片的存储能力(各运算核的存储能力)时,无法实现权重数据的一次性加载。
发明人在实现本发明的过程中,发现相关技术的方式整个运算过程耗时长,运算效率低。
发明内容
本发明实施例提供了一种神经网络的运算方法、装置、计算机设备及存储介质,以提高折叠组运行场景中的运算效率。
第一方面,本发明实施例还提供种神经网络的运算方法,所述神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同所述折叠组对应的运算核至少有部分相同,其特征在于,所述方法包括:当确定第N+1折叠组的目标层满足就绪条件时,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
第二方面,本发明实施例还提供一种神经网络的运算装置,所述神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同所述折叠组对应的运算核至少有部分相同,其特征在于,所述装置包括:就绪条件确定模块,用于判断第N+1折叠组中的目标层是否满足就绪条件;并行处理模块,用于当确定第N+1折叠组的目标层满足就绪条件时,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
第三方面,本发明实施例还提供了一种计算机设备,所述计算机设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个 或多个处理器实现如本发明任一实施例所述的神经网络的运算方法。
第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如本发明任一实施例所述的神经网络的运算方法。
本发明实施例的技术方案在神经网络的折叠组式运行场景中,引入了新的折叠组间的并行机制,后一折叠组的运算并不是以前一折叠组整体的运行结束作为启动条件,而是在前一折叠组的运行过程中,当检测到后一折叠组中有一设定层(目标层)满足就绪条件时,即可开始对后一折叠组中的该层进行运算。本发明实施例的技术方案提供了一种新的折叠组间的运行机制,减少了折叠组式神经网络的运算耗时,并提高了折叠组式神经网络的运算效率。
附图说明
图1a为相关技术中的一种将神经网络划分至多个折叠组的示意图;
图1b为相关技术中的一种以流水线的形式运算多个折叠组的结构示意图;
图1c为相关技术中的一种以流水线的形式运算多个折叠组的时序图;
图1d是本发明实施例中的一种神经网络的运算方法的实现流程图;
图1e是本发明实施例所适用的一种并行运算多个折叠组的时序图;
图2a是本发明实施例中的另一种神经网络的运算方法的实现流程图;
图2b是本发明实施例所适用的一种并行运算多个折叠组的时序图;
图3是本发明实施例中的另一种神经网络的运算方法的实现流程图;
图4是本发明实施例中的一种神经网络的运算装置的结构图;
图5是本发明实施例中的一种计算机设备的结构图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
另外还需要说明的是,为了便于描述,附图中仅示出了与本发明 相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
为了便于描述本发明实施例的技术方案,首先参考图1c,对图1a、图1b所示的在一些相关技术中神经网络对应的各运算核的运算时序进行简单表述。
在一些相关技术中,可以将神经网络进行折叠处理,如图1a所示,可以将包括7个层的神经网络划分至两个折叠组中,折叠组一中包括第一层到第四层,折叠组二中包括第五层到第七层。请参考图1b,为了运行上述神层经网络,物理芯片中的运算核首先加载折叠组一所需的权重数据进行折叠组一的运算,并在全部运算完成后,将中间数据暂存在内存中,继续加载折叠组二所需的权重数据后,使用内存中的中间数据再进行折叠组二的运算。
结合图1a-图1b,神经网络共包括七层,其中,第一层至第四层划分至折叠组一,第五层至第七层划分至折叠组二。同时,不同层由不同的运算核(图1b的方块中简写为核)运行。如图1c所示,横坐标对应不同的时间段(也可以称为时间片),每个时间段代表一个运算核单次运算(根据输入数据得到运算结果数据)所需的时间,纵坐标对应不同的运算核。其中,在图1c中示出了针对5项输入数据(数据1、数据2数据3、数据4以及数据5)的运算过程,不同数据(和数据在层中的处理结果)使用不同编号的矩形块表示。
相应的,在T1时间段,数据1分别输入至运算核1至运算核3,由上述三个运算核进行第一层的运算;在T2时间段,数据2分别输入至运算核1至运算核3进行第一层的运算,同时,数据1的第一层运算在T1时间段完成后产生的数据(运算结果数据)传输至运算核4进行第二层的运算。在T3时间段,数据3分别输入至运算核1至运算核3进行第一层的运算、数据2在第一层中于T2时间段进行的运算完成后得到的数据传输至运算核4进行第二层的运算,以及数据1在第二层中于T2时间段进行的运算完成后得到的数据传输至运算核5-运算核8进行第三层的运算,以此类推。只有在T8时间段,运算核9和运算核10才最终完成对上述最后一个数据(数据5)的第四层的运算,也即,第一折叠组才算处理完成。
相应的,在T9时间段(即第一折叠组处理完成后)才能开始第二 折叠组的运算,也即,运算核1至运算核4,开始根据运算核9和运算核10对数据1在第四层运算产生的数据(运算结果数据)进行数据1的第五层的运算(即开始第二折叠组的运算),之后以此类推,直至完成折叠组二中数据1-数据5的处理(图1c未全部示出)。
发明人在实现本发明的过程中发现:多个折叠组之间采用以上流水式运算,在完成前一折叠组的运算之后,才能进行后一折叠组的运算,以上方式整个运算过程耗时长,运算效率低。具体的:实际上,对于第二折叠组中的第五层针对数据1的运算来说,其所需的运算数据在T4时间段之后(T5时间段)已经准备就绪(运算核9和运算核10在T4时间段处理得到的数据),其所需的运算核(运算核1-运算核4)在T6时间段之后(T7时间段)已经准备就绪。因此,在不具有任何数据和资源冲突的前提下,该第五层(数据1的第五层)最快应该可以在T7时间段开始执行,而无需等待折叠组一整体处理完成。
基于此,发明人创造性的提出了一种折叠组间的并行运算方式,在前一折叠组的运算过程中,只要后一折叠组中的某一层满足就绪条件,即可开始该层的运算,而无需等待前一折叠组整体运行结束,以大大提高折叠组式神经网络的运算效率。
图1d为本发明实施例提供的一种神经网络的运算方法的流程图,本实施例可适用于将包括至少两个折叠组的神经网络在同一众核系统(如一个芯片)内进行运算(片内运算)的情况,该方法可以由神经网络的运算装置来执行,该装置可以由软件和/或硬件的方式实现,并一般可以集中在用于运行该神经网络的众核系统中,方法具体包括:当确定第N+1折叠组的目标层满足就绪条件时,将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
方法具体可包括以下S110至S120。
S110、判断神经网络的第N+1折叠组中的目标层是否满足就绪条件:若是,执行S120,否则,返回继续执行S110。
在本实施例中,N为大于或等于1的正整数,N的取值范围为N∈[1,M],M+1为该神经网络包括的折叠组的总数量。
如前,本发明实施例所适用的神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,运算核具体为众核系统内的硬件运算单元,用于以硬件的形式对对应的层进行运算处理。
其中,同一折叠组中的不同层对应的运算核不同,且不同折叠组对应的运算核至少有部分相同。也即,同一折叠组中的不同层需分别对应不同的运算核,而至少部分不同折叠组(尤其是相邻的折叠组) 中的层对应的运算核完全相同,或者具有至少一个运算核的重叠。
一般来说,为最大程度的利用众核系统内的硬件运算单元,可将全部硬件运算单元分别分配给全部折叠组。例如,如果一个众核系统包括运算核1-运算核10,在该众核系统上运行的神经网络包括折叠组一和折叠组二,此时,可以同时将运算核1-运算核10分别分配给折叠组一和折叠组二,以实现两个折叠组均使用最多的运算硬件进行运算。
另一方面,考虑到不同折叠组的运算量可能会有一定的差异,为了尽量保证不同折叠组的运算均衡,不同折叠组对应的运算核之间也可能会有差异。例如,将运算核1-运算核10分配给折叠组一,将运算核1-运算核8分配给折叠组二。
具体的,可以预先设定或者获知每个运算核的运算能力,以及每个层所需的运算能力,并通过保证每个层至少获得所需的运算能力的方式,为每个层分配对应的运算核;或者,可以预先设定每个运算核的标识信息(例如,运算核编号),并通过预编译的方式,指定与每个层对应的标识信息,进而为每个层分配对应的运算核。
如前,本发明实施例的方案提供了一种折叠组的并行运算方式,也即,后一折叠组的运算不是以前一折叠组整体运行结束为触发运行时机,而是以该折叠组满足就绪条件为触发运行时机。相应的,在本发明实施例中,第N+1折叠组的目标层具体是指在神经网络当前处于运算状态的各层中最后一层的下一层。也即,最近的一个需要触发运算的层。该目标层可以为第N+1折叠组中的第一层,也可以为第N+1折叠组中第一层之后的任一层。
例如,神经网络当前处于运算状态的层为第三层、第四层和第五层。其中,最后一层为第五层,进而,该目标层为第五层的下一层第六层。也即,本发明实施例的技术方案在当前已有一折叠组处于运算状态的过程中,需要实时判断当前处于最后一个运行状态的层之后的层是否位于后一折叠组(即是否为目标层)中,若是,还需要判断目标层是否满足就绪条件,进而按照就绪条件对该目标层进行触发运算。
其中,确定第N+1折叠组的目标层满足就绪条件,可以包括:如果确定目标层所需的运算数据准备就绪,并且目标层对应的运算核准备就绪,则确定目标层满足就绪条件。也即,当目标层所需的运算数据以及所需使用的运算核均准备就绪时,该目标层满足就绪条件,可以开始对该目标层进行运算。
在本实施例的一个可选的实施方式中,确定目标层所需的运算数据准备就绪,可以包括:当确定前序层针对当前输入数据已经输出运算结果数据,则确定目标层所需的运算数据准备就绪,其中,前序层为与目标层相连的前一层。
请参考图1a至1c,当确定第N+1折叠组为折叠组二,第N+1折叠组的目标层为第五层时,该第五层所需的运算数据为第四层输出的运算结果,也即,每当第四层针对当前输入数据,运算并输出相应的运算结果数据后,第五层可以基于该运算结果数据开始进行本层的数据运算。
可以理解的是,前序层可以处于第N+1折叠组,也可以处于第N折叠组。当目标层为第N+1折叠组中的第一个层时,该前序层为第N折叠组中的最后一层;当目标层为第N+1折叠组中除去第一个层之外的其他层时,该前序层为第N+1折叠组中,该其他层的前一层。
在本实施例的一个可选的实施方式中,确定与目标层对应的运算核准备就绪,可以包括:
如果确定当前空闲的运算核的总运算能力与目标层所需运算能力相匹配,则确定目标层对应的运算核准备就绪,且确定目标层对应的运算核均为当前空闲的运算核。
在本实施方式中,可以预先设定或者获知与众核系统中的每个运算核分别对应的运算能力,也即,单次运算过程中所能提供的运算量。目标层所需运算能力具体是指目标层针对当前输入数据进行运算时,所需实现的运算量。
具体的,本实施方式可以不预先明确指定为目标层具体分配哪个或者哪几个运算核,而仅仅比对当前空闲的运算核的总运算能力(当前空闲运算核的运算能力的累加和)是否与目标层所需运算能力相匹配(“匹配”包括当前空闲的运算核的总运算能力与目标层所需运算能力相等,或超出),若是,则可以将上述空闲运算核分配给目标层进行运算处理(也就是从空闲运算核中选出全部或部分作为目标层对应的运算核),其中,显然应保证分配给目标层的空闲运算核的总运算能力与目标层所需运算能力相匹配。
在本实施例的另一个可选的实施方式中,确定与目标层对应的运算核准备就绪,还可以包括:如果确定当前空闲的运算核与目标层对应的运算核相匹配,则确定与目标层对应的运算核准备就绪。
在本实施方式中,参考图1b,可以通过预编译的方式预先设定该神经网络中的各个层与众核系统中各个运算核之间的对应关系(例如是确定各层对应的运算核的编号),进而当与目标层明确对应的各个编号的运算核均处于空闲状态时(当然还可不与目标层对应的其它的运算核处于空闲状态),即可确定与目标层对应的运算核准备就绪。
S120、将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
在本实施例中,是以第N+1折叠组的目标层是否准备就绪作为该目标层的运算启动条件的,因此,当准备就绪时,可以在第N折叠组的运算过程中,启动第N+1折叠组的处理,也即,实现了将第N+1折叠组与第N折叠组并行处理,其中至少第N+1折叠组中的目标层与第N折叠组的部分层并行处理,当然,第N+1折叠组和第N折叠组中还可有其它层也在并行处理(如之前时间段的“目标层”)。
在图1e中示出了本发明实施例所适用的一种并行运算多个折叠组的时序图,如图1e所示,在T7时间段,折叠组二中的第五层为目标层,且第五层既可以获取折叠组一中的第四层针对数据1输出的运算结果数据(T5时间段可用),同时,与第五层对应的运算核1-运算核4也已经完成对折叠组一中数据1至数据5的处理(第一层和第二层的处理),处于空闲状态(T7时间段可用)。因此,第五层满足就绪条件,可以触发执行第五层的运算,也即,在T7时间段,运算核1-运算核4进行折叠组二中的层(第五层)的运算,运算核5-运算核10进行了折叠组一中的层(第三层、第四层)的运算,也即实现了折叠组一和折叠组二的并行处理。
相类似的,在T8时间段,折叠组二中的第六层为目标层,其既可以获取折叠组二中第五层针对数据1输出的运算结果数据,同时,与第六层对应的运算核5-运算核6也已经完成对折叠组一中数据1至数据5的处理(第三层的处理),处于空闲状态,因此,可以触发执行第六层的运算,也即,在T8时间段,运算核1-运算核6进行折叠组二中的层(第五层、第六层)的运算,同时运算核9-运算核10进行折叠组一中的层(第四层)的运算,也即实现了折叠组一和折叠组二的并行处理(此时运算核7、运算核8空闲)。
结合图1c和图1e的对比,上述折叠组间的并行处理操作,针对5个输入数据的情况,可以至少节省两个时间段的时长,可以理解的是,当神经网络中包括的折叠组越多,每个折叠组中包括的运算核越多,上述方案可以节省的时间段的数量也就越多,神经网络的运算效率提高的也就越明显。
本发明实施例的技术方案在神经网络的折叠组式运行场景中,引入了新的折叠组间的并行机制,后一折叠组的运算并不是以前一折叠组整体的运行结束作为启动条件,而是在前一折叠组的运行过程中,当检测到后一折叠组中有一设定层(目标层)满足就绪条件时,即可开始对后一折叠组中的该层进行运算,本发明实施例的技术方案提供了一种新的折叠组间的运行机制,减少了折叠组式神经网络的运算耗时,并提高了折叠组式神经网络的运算效率。
图2a为本发明实施例提供的另一种神经网络的运算方法的实现流 程图,本实施例以上述实施例为基础进行细化,在本实施例中,将第N+1折叠组与第N折叠组并行处理的操作,细化为:执行向与目标层对应的各运算核加载与目标层对应的权重数据的操作,其中,目标层对应的各运算核包括第N折叠组中已完成运算层对应的运算核;在通过第N折叠组中未完成运算层对应的各运算核进行运算的过程中,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
同时,在触发第N+1折叠组中目标层对应的各运算核进行并行运算的操作之后,还可以包括:通过与目标层对应的各运算核,根据从内存中获取的运算结果数据或者与目标层相连的前序层实时输出的运算结果数据,进行并行运算;通过与目标层对应的各运算核,将当前运算得到的运算结果数据存储于内存中或者实时输出至与目标层相连的后序层。
相应的,本实施例的方法具体包括以下S210至S260。
S210、判断前序层针对当前输入数据是否已经输出运算结果数据:若是,则执行S220;否则,返回执行S210。
其中,前序层为与目标层相连的前一层。也即,该前序层为当前处于运算状态的各层中的最后一层,该层可以处于第N折叠组,也可以处于第N+1折叠组。
在本实施例中,首先可以验证该前序层是否针对当前输入数据输出了相应的运算结果数据,也即,目标层所需的运算数据。
S220、判断当前空闲的运算核的编号与目标层对应的运算核的编号是否相匹配:若是,执行S230;否则,返回执行S220。
在本实施例中,在确定目标层所需的运算数据准备就绪后,可以进一步判断与目标层对应的运算核是否均处于空闲状态。
S230、执行向与目标层对应的各运算核,加载与目标层对应的权重数据的操作。
在本实施例中,在确定目标层所需的运算数据,以及与目标层对应的运算核均准备好后,需要首先向与目标层对应的各运算核加载权重数据。其中,目标层对应的各运算核至少包括第N折叠组中已完成运算层对应的运算核,即第N+1折叠组中目标层对应的运算核有至少一部分是原本处理第N折叠组中的层对应的运算核,但在此时已经空闲。实际上,因为设置了多个折叠组,即是需要相邻折叠组共用相同的运算核进行运算处理,而只有前一折叠组中出现完成运算的空闲运算核后,后一折叠组中的目标层才可能满足就绪条件,因此,目标层对应的各运算核包括第N折叠组中已经完成运算的层对应的运算核,进一步可以全都是第N折叠组中已经完成运算的层对应的运算核。
在相关技术中,为了在众核系统中运行权重数据大于众核系统的运算核的存储能力的神经网络,需要首先将神经网络划分为多个折叠组,并通过流水线的形式对各个折叠组进行运算以及权重的交替存储,也即:前一折叠组整体运算完成后,将当前各运算核中存储的与前一折叠组对应的权重数据整体替换为与后一折叠组对应的权重数据,开启后一折叠组的运算处理。
在本实施例中,由于实现了相邻折叠组间的并行处理,因此,无法针对一个折叠组进行权重数据的整体加载,因此,可以在目标层准备就绪后,仅加载与目标层对应的权重数据(其会替换掉第N折叠组的部分权重数据),以完成对目标层的运算。
具体的,可以在预编译阶段确定与神经网络中的每一层分别对应的权重数据,并预先存储于内存中,并可以通过直接读取的方式读取得到与每一层对应的权重数据。
S240、在通过第N折叠组中未完成运算层对应的各运算核进行运算的过程中,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
如前,第N+1折叠组目标层需要使用第N折叠组中已经完成运算的层空闲出的运算核,与第N折叠组中未完成运算的一个或者多个层一起,进行并行运算,故目标层的运算是在对第N折叠组中未完成运算的层继续处理的过程中触发的。
S250、通过与目标层对应的各运算核,根据从内存中获取的运算结果数据或者与目标层相连的前序层实时输出的运算结果数据,进行并行运算。
如图1e所示,目标层所需的运算数据可能会比目标层对应的运算核先准备好,例如,折叠组二的第五层所需的运算数据在T5时间段即可使用,而与折叠组二的第五层对应的运算核1-运算核4在T7时间段才准备好,因此,可以先将前序层运算得到的运算结果数据(即当前目标层所需的运算数据)存储内存中,当T7时间段到达时,由运算核1-运算核4从内存中获取对应的运算结果数据进行运算;
或者,目标层所需的运算数据可能会与对应的运算核一起准备好,例如,折叠组二的第六层所需的运算数据在T8时间段准备好,而与折叠组二的第六层对应的运算核5-运算核6同样也在T8时间段准备好,此时,目标层的前序层(例如,第五层)可以直接将运算得到的运算结果数据传输至目标层,由目标层直接在新的时间段进行相应的运算处理。
S260、通过与目标层对应的各运算核,将当前运算得到的运算结 果数据存储于内存中,或者实时输出至与目标层相连的后序层。
相类似的,当与目标层对应的各运算核得到运算结果数据(即下一个后续层作为目标层时所需的运算数据)后,如果确定与后序层对应的运算核已经准备就绪,则可以直接将运算结果数据实时输出至后序层(也即,与目标层相连的后一层);如果确定与后序层对应的运算核没有准备就绪,则可以将当前运算得到的运算结果数据存储于内存中,由后序层作为目标层且满足就绪条件时,实时从内存中获取并进行相应的运算。
可以理解的是,执行向与目标层对应的各运算核,加载与目标层对应的权重数据的操作是需要耗费一定的时间的,如果上述操作耗费的时间较短,则可以将上述操作融入一个时间段中,在每个时间段内同时完成权重数据的加载以及数据的运算过程。如果上述操作耗费的时间较长,则可以为上述操作单独分配权重加载时间,并在满足就绪条件后,等待该权重加载时间完成,再由运算核在新的时间段内执行数据的运算。
相应的,在本实施例的一个可选的实施方式中,触发第N+1折叠组中目标层对应的各运算核进行并行运算,可以包括:从第N+1折叠组的目标层满足就绪条件开始,间隔权重加载时长后,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
其中,为了便于统一的协调管理,权重加载时长可以为一个预设的定值,例如一个时间段;或者,为了最大程度的降低时间浪费,该权重记载时长可以根据实际的权重加载时间确定,也即,每当目标层对应的各运算核完成了权重的加载过程,即刻触发第N+1折叠组中目标层对应的各运算核进行并行运算。
其中,图2b是本发明实施例所适用的一种并行运算多个折叠组的时序图,如图2b所示,在T6时间段,折叠组二的第五层所需的运算数据,以及对应的运算核均准备就绪,因为,可以再等待一个时间段(T7时间段),完成对第五层的权重数据的加载等待,并最终在T8时间段,开始进行第五层的运算。
本发明实施例的技术方案根据前序层输出目标层所需的运算数据的时间和与目标层对应的运算核准备就绪的时间之间的关系,确定目标层具体是从内存从获取运算数据或者从该前序层直接获取数据,进一步完善了本发明实施例的应用场景,最大程度的利用了众核系统中的各项资源,同时,通过在目标层的权重加载过程分配权重加载时间,可以不占用每个运算核运算时所需的时间段,以进一步保证神经网络的运算准确性和可靠性。
图3为本发明实施例提供的另一种神经网络的运算方法的实现流 程图,本实施例以上述实施例为基础进行细化,在本实施例中,在将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理之前,方法还包括:响应于运算开始指令,将与神经网络的第一折叠组中的各层对应的权重数据分别加载至与第一折叠组的各层对应的各运算核中;通过与第一折叠组的各层对应的各运算核进行运算。
相应的,本实施例的方法具体包括以下S310至S330。
S310、响应于运算开始指令,将与神经网络的第一折叠组中的各层对应的权重数据分别加载至与第一折叠组的各层对应的各运算核中。
在本实施例中,由于为神经网络的同一折叠组所分配的各运算核均不重叠,因此,为了最大程度的提高神经网络的运算效率,可以在神经网络的第一折叠组运行之前,一次性加载该第一折叠组所需的全部权重数据。
在本实施例的一个可选的实施方式中,神经网络为不包括反馈环路的神经网络。这样设置的原因在于:当一个神经网络不包括反馈环路(即在后层的输出返回作为在前层的输入)时,某一层完成对输入数据的完整运算后,不会再次启动运算,因此,与该层对应的各运算核均可以分配给其他层使用,而不会发生运算核的分配冲突的情况。
其中,不包括反馈环路的神经网络可以为ANN(Artificial Neural Network,人工神经网络)或者SNN(SNN-Spiking Neuron Networks,脉冲神经网络)等,本实施例对此并不进行限制。
S320、通过与第一折叠组的各层对应的各运算核进行运算。
S330、当确定第N+1折叠组的目标层满足就绪条件时,将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
本发明实施例的技术方案在神经网络的折叠组式运行场景中,引入了新的折叠组间的并行机制,后一折叠组的运算并不是以前一折叠组整体的运行结束作为启动条件,而是在前一折叠组的运行过程中,当检测到后一折叠组中有一设定层(目标层)满足就绪条件时,即可开始对后一折叠组中的该层进行运算,本发明实施例的技术方案提供了一种新的折叠组间的运行机制,减少了折叠组式神经网络的运算耗时,并提高了折叠组式神经网络的运算效率。
图4是本发明实施例提供的一种神经网络的运算装置的结构图。神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同折叠组对应的运算核至少有部分相同。如图4所示,装置包括: 就绪条件确定模块410以及并行处理模块420。
就绪条件确定模块410,用于判断第N+1折叠组中的目标层是否满足就绪条件;并行处理模块420,用于当确定第N+1折叠组的目标层满足就绪条件时,将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
本发明实施例的技术方案在神经网络的折叠组式运行场景中,引入了新的折叠组间的并行机制,后一折叠组的运行并不是以前一折叠组整体的运行结束作为启动条件,而是在前一折叠组的运行过程中,当检测到后一折叠组中有一设定层(目标层)满足就绪条件时,即可开始对后一折叠组中的该层进行运算,本发明实施例的技术方案提供了一种新的折叠组间的运行机制,减少了折叠组式神经网络的运算耗时,并提高了折叠组式神经网络的运算效率。
在上述各实施例的基础上,就绪条件确定模块410,可以包括:运算数据判断单元,用于判断目标层所需的运算数据是否准备就绪;运算核判断单元,用于判断目标层对应的运算核是否准备继续;综合确定单元,用于如果确定目标层所需的运算数据准备就绪,并且目标层对应的运算核准备就绪,则确定目标层满足就绪条件。
在上述各实施例的基础上,运算数据判断单元,具体可以用于:当确定前序层针对当前输入数据已经输出运算结果数据,则确定目标层所需的运算数据准备就绪,其中,前序层为与目标层相连的前一层。
在上述各实施例的基础上,运算核判断单元,具体可以用于:如果确定当前空闲的运算核的总运算能力与目标层所需运算能力相匹配,则确定目标层对应的运算核准备就绪,且确定目标层对应的运算核均为当前空闲的运算核;或者,如果确定当前空闲的运算核与目标层对应的运算核相匹配,则确定与目标层对应的运算核准备就绪。
在上述各实施例的基础上,并行处理模块420,包括:权重加载单元,用于执行向与目标层对应的各运算核加载与目标层对应的权重数据的操作,其中,目标层对应的各运算核包括第N折叠组中已完成运算层对应的运算核;并行运算触发单元,用于在通过第N折叠组中未完成运算层对应的各运算核进行运算的过程中,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
在上述各实施例的基础上,并行运算触发单元,具体用于:从第N+1折叠组的目标层满足就绪条件开始,间隔权重加载时长后,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
在上述各实施例的基础上,装置还可以包括并行处理模块420还包括目标层运行单元,用于:在触发第N+1折叠组中目标层对应的各 运算核进行并行运算之后,通过与目标层对应的各运算核,根据从内存中获取的运算结果数据或者与目标层相连的前序层实时输出的运算结果数据,进行并行运算;通过与目标层对应的各运算核,将当前运算得到的运算结果数据存储于内存中或者实时输出至与目标层相连的后序层。
在上述各实施例的基础上,装置还可以包括第一折叠组运算模块,用于:响应于运算开始指令,将与神经网络的第一折叠组中的各层对应的权重数据分别加载至与第一折叠组的各层对应的各运算核中;通过与第一折叠组的各层对应的各运算核进行运算。
在上述各实施例的基础上,神经网络可以为不包括反馈环路的神经网络。
本发明实施例所提供的神经网络的运算装置可执行本发明任意实施例所提供的神经网络的运算方法,具备执行方法相应的功能模块和有益效果。
图5为本发明实施例提供的一种计算机设备的结构示意图,如图5所示,该计算机设备包括处理器50、存储装置51,还可包括输入装置52和输出装置53;计算机设备中处理器50的数量可以是一个或多个,图5中以一个处理器50为例;计算机设备中的处理器50、存储装置51、输入装置52和输出装置53可以通过总线或其他方式连接,图5中以通过总线连接为例。
存储装置51作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块(计算机程序),如本发明实施例中的多任务并行处理方法对应的模块。处理器50通过运行存储在存储装置51中的计算机程序,从而执行计算机设备的各种功能应用以及数据处理,即实现如本发明任意实施例的神经网络的运算方法。
其中,神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同折叠组对应的运算核至少有部分相同,方法包括:
当确定第N+1折叠组的目标层满足就绪条件时,将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
存储装置51可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储装置51可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置51可进一步包括相对于处理器50远程设置的存储器,这些 远程存储器可以通过网络连接至计算机设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置52可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置53可包括显示屏等显示设备。
本发明实施例还提供一种包含计算机可执行指令(即计算机程序)的计算机可读存储介质,计算机程序在由处理器执行时用于执行如本发明任意实施例的神经网络的运算方法。
其中,神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同折叠组对应的运算核至少有部分相同,方法包括:
当确定第N+1折叠组的目标层满足就绪条件时,将第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
当然,本发明实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上的方法操作,还可以执行本发明任意实施例所提供的方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (20)

  1. 一种神经网络的运算方法,所述神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同所述折叠组对应的运算核至少有部分相同,其特征在于,所述方法包括:
    当确定第N+1折叠组的目标层满足就绪条件时,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
  2. 根据权利要求1所述的方法,其特征在于,确定第N+1折叠组的目标层满足就绪条件,包括:
    如果确定所述目标层所需的运算数据准备就绪,并且所述目标层对应的运算核准备就绪,则确定所述目标层满足就绪条件。
  3. 根据权利要求2所述的方法,其特征在于,确定所述目标层所需的运算数据准备就绪,包括:
    当确定前序层针对当前输入数据已经输出运算结果数据,则确定所述目标层所需的运算数据准备就绪,其中,所述前序层为与所述目标层相连的前一层。
  4. 根据权利要求2所述的方法,其特征在于,确定与所述目标层对应的运算核准备就绪,包括:
    如果确定当前空闲的运算核的总运算能力与所述目标层所需运算能力相匹配,则确定所述目标层对应的运算核准备就绪,且确定所述目标层对应的运算核均为当前空闲的运算核;
    或者,
    如果确定当前空闲的运算核与所述目标层对应的运算核相匹配,则确定与所述目标层对应的运算核准备就绪。
  5. 根据权利要求1所述的方法,其特征在于,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理,包括:
    执行向与所述目标层对应的各运算核加载与所述目标层对应的权重数据的操作,其中,所述目标层对应的各运算核包括第N折叠组中已完成运算层对应的运算核;
    在通过第N折叠组中未完成运算层对应的各运算核进行运算的过程中,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
  6. 根据权利要求5所述的方法,其特征在于,触发第N+1折叠 组中目标层对应的各运算核进行并行运算,包括:
    从第N+1折叠组的目标层满足就绪条件开始,间隔权重加载时长后,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
  7. 根据权利要求5所述的方法,其特征在于,在触发第N+1折叠组中目标层对应的各运算核进行并行运算之后,还包括:
    通过与所述目标层对应的各运算核,根据从内存中获取的运算结果数据或者与所述目标层相连的前序层实时输出的运算结果数据,进行并行运算;
    通过与所述目标层对应的各运算核,将当前运算得到的运算结果数据存储于内存中或者实时输出至与所述目标层相连的后序层。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,在将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理之前,还包括:
    响应于运算开始指令,将与所述神经网络的第一折叠组中的各层对应的权重数据分别加载至与所述第一折叠组的各层对应的各运算核中;
    通过与所述第一折叠组的各层对应的各运算核进行运算。
  9. 根据权利要求1-7任一项所述的方法,其特征在于,所述神经网络为不包括反馈环路的神经网络。
  10. 一种神经网络的运算装置,所述神经网络包括多个折叠组,每个折叠组包括一层或多个连续的层,每层对应至少一个运算核,同一折叠组中的不同层对应的运算核不同,且不同所述折叠组对应的运算核至少有部分相同,其特征在于,所述装置包括:
    就绪条件确定模块,用于判断第N+1折叠组中的目标层是否满足就绪条件;
    并行处理模块,用于当确定第N+1折叠组的目标层满足就绪条件时,将所述第N+1折叠组的目标层与第N折叠组的部分层分别在各自对应的运算核中并行处理。
  11. 根据权利要求10所述的装置,其特征在于,所述就绪条件确定模块,包括:
    运算数据判断单元,用于判断所述目标层所需的运算数据是否准备就绪;
    运算核判断单元,用于判断所述目标层对应的运算核是否准备继续;
    综合确定单元,用于如果确定所述目标层所需的运算数据准备就 绪,并且所述目标层对应的运算核准备就绪,则确定所述目标层满足就绪条件。
  12. 根据权利要求11所述的装置,其特征在于,所述运算数据判断单元,具体用于:
    当确定前序层针对当前输入数据已经输出运算结果数据,则确定所述目标层所需的运算数据准备就绪,其中,所述前序层为与所述目标层相连的前一层。
  13. 根据权利要求11所述的装置,其特征在于,所述运算核判断单元,具体用于:
    如果确定当前空闲的运算核的总运算能力与所述目标层所需运算能力相匹配,则确定所述目标层对应的运算核准备就绪,且确定所述目标层对应的运算核均为当前空闲的运算核;
    或者,
    如果确定当前空闲的运算核与所述目标层对应的运算核相匹配,则确定与所述目标层对应的运算核准备就绪。
  14. 根据权利要求10所述的装置,其特征在于,并行处理模块,包括:
    权重加载单元,用于执行向与所述目标层对应的各运算核加载与所述目标层对应的权重数据的操作,其中,所述目标层对应的各运算核包括第N折叠组中已完成运算层对应的运算核;
    并行运算触发单元,用于在通过第N折叠组中未完成运算层对应的各运算核进行运算的过程中,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
  15. 根据权利要求14所述的装置,其特征在于,所述并行运算触发单元,具体用于:
    从第N+1折叠组的目标层满足就绪条件开始,间隔权重加载时长后,触发第N+1折叠组中目标层对应的各运算核进行并行运算。
  16. 根据权利要求14所述的装置,其特征在于,并行处理模块还包括目标层运行单元,用于:
    在触发第N+1折叠组中目标层对应的各运算核进行并行运算之后,通过与所述目标层对应的各运算核,根据从内存中获取的运算结果数据或者与所述目标层相连的前序层实时输出的运算结果数据,进行并行运算;
    通过与所述目标层对应的各运算核,将当前运算得到的运算结果数据存储于内存中或者实时输出至与所述目标层相连的后序层。
  17. 根据权利要求10-16任一项所述的装置,其特征在于,还包括第一折叠组运算模块,用于:
    响应于运算开始指令,将与所述神经网络的第一折叠组中的各层对应的权重数据分别加载至与所述第一折叠组的各层对应的各运算核中;
    通过与所述第一折叠组的各层对应的各运算核进行运算。
  18. 根据权利要求10-16任一项所述的装置,其特征在于,所述神经网络为不包括反馈环路的神经网络。
  19. 一种计算机设备,其特征在于,所述计算机设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一所述的神经网络的运算方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-9中任一所述的神经网络的运算方法。
PCT/CN2021/112471 2020-08-21 2021-08-13 神经网络的运算方法、装置、计算机设备及存储介质 WO2022037490A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010852104.0 2020-08-21
CN202010852104.0A CN111985634A (zh) 2020-08-21 2020-08-21 神经网络的运算方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022037490A1 true WO2022037490A1 (zh) 2022-02-24

Family

ID=73442933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112471 WO2022037490A1 (zh) 2020-08-21 2021-08-13 神经网络的运算方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN111985634A (zh)
WO (1) WO2022037490A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023206889A1 (zh) * 2022-04-26 2023-11-02 北京百度网讯科技有限公司 模型推理方法、装置、设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985634A (zh) * 2020-08-21 2020-11-24 北京灵汐科技有限公司 神经网络的运算方法、装置、计算机设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451659A (zh) * 2017-07-27 2017-12-08 清华大学 用于位宽分区的神经网络加速器及其实现方法
CN109359736A (zh) * 2017-04-06 2019-02-19 上海寒武纪信息科技有限公司 网络处理器和网络运算方法
CN109409513A (zh) * 2018-10-10 2019-03-01 广州市百果园信息技术有限公司 一种基于神经网络的任务处理方法及相关设备
CN109919311A (zh) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 生成指令序列的方法、执行神经网络运算的方法和装置
CN110717574A (zh) * 2018-07-11 2020-01-21 杭州海康威视数字技术股份有限公司 一种神经网络运行方法、装置及异构智能芯片
CN111047602A (zh) * 2019-11-26 2020-04-21 中国科学院深圳先进技术研究院 图像分割方法、装置及终端设备
CN111047031A (zh) * 2018-10-12 2020-04-21 西部数据技术公司 用于神经网络中的数据重用的移位架构
CN111488051A (zh) * 2020-03-06 2020-08-04 复旦大学 基于cpu和fpga协同计算的云端深度神经网络优化方法
CN111985634A (zh) * 2020-08-21 2020-11-24 北京灵汐科技有限公司 神经网络的运算方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018058427A1 (zh) * 2016-09-29 2018-04-05 北京中科寒武纪科技有限公司 神经网络运算装置及方法
CN107729994B (zh) * 2017-11-28 2020-05-26 南京地平线机器人技术有限公司 执行卷积神经网络中的卷积层的运算的方法和装置
KR102562320B1 (ko) * 2018-12-24 2023-08-01 삼성전자주식회사 비트 연산 기반의 뉴럴 네트워크 처리 방법 및 장치
CN110443354A (zh) * 2019-07-26 2019-11-12 深圳大学 一种基于多组张列量分解的深度神经网络压缩方法、系统、装置及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359736A (zh) * 2017-04-06 2019-02-19 上海寒武纪信息科技有限公司 网络处理器和网络运算方法
CN107451659A (zh) * 2017-07-27 2017-12-08 清华大学 用于位宽分区的神经网络加速器及其实现方法
CN110717574A (zh) * 2018-07-11 2020-01-21 杭州海康威视数字技术股份有限公司 一种神经网络运行方法、装置及异构智能芯片
CN109409513A (zh) * 2018-10-10 2019-03-01 广州市百果园信息技术有限公司 一种基于神经网络的任务处理方法及相关设备
CN111047031A (zh) * 2018-10-12 2020-04-21 西部数据技术公司 用于神经网络中的数据重用的移位架构
CN109919311A (zh) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 生成指令序列的方法、执行神经网络运算的方法和装置
CN111047602A (zh) * 2019-11-26 2020-04-21 中国科学院深圳先进技术研究院 图像分割方法、装置及终端设备
CN111488051A (zh) * 2020-03-06 2020-08-04 复旦大学 基于cpu和fpga协同计算的云端深度神经网络优化方法
CN111985634A (zh) * 2020-08-21 2020-11-24 北京灵汐科技有限公司 神经网络的运算方法、装置、计算机设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023206889A1 (zh) * 2022-04-26 2023-11-02 北京百度网讯科技有限公司 模型推理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111985634A (zh) 2020-11-24

Similar Documents

Publication Publication Date Title
WO2022037490A1 (zh) 神经网络的运算方法、装置、计算机设备及存储介质
CN113254178B (zh) 一种任务调度方法、装置、电子设备及可读存储介质
US20170331867A1 (en) Method, device and system for pushing file
WO2022105805A1 (zh) 数据的处理方法及存算一体芯片
CN110717574B (zh) 一种神经网络运行方法、装置及异构智能芯片
CN110308984B (zh) 一种用于处理地理分布式数据的跨集群计算系统
CN108776897A (zh) 数据处理方法、装置、服务器及计算机可读存储介质
CN111221643A (zh) 任务处理方法和任务处理装置
CN110990154B (zh) 一种大数据应用优化方法、装置及存储介质
CN115600676A (zh) 深度学习模型推理方法、装置、设备及存储介质
CN112035229A (zh) 一种计算图处理方法、装置及存储介质
CN104714839A (zh) 一种控制进程生命期的方法和装置
WO2022027902A1 (zh) 多任务并行处理方法、装置、计算机设备及存储介质
CN111598768B (zh) 图像优化处理方法、装置、计算机设备及存储介质
US20230067432A1 (en) Task allocation method, apparatus, electronic device, and computer-readable storage medium
CN116243983A (zh) 处理器、集成电路芯片、指令处理方法、电子设备和介质
CN116126719A (zh) 接口测试方法、装置、电子设备及存储介质
US20220114469A1 (en) Methods and apparatus for parallel quantum computing
CN113806055A (zh) 一种轻量级任务调度方法、系统、装置及存储介质
CN102609306A (zh) 多核处理芯片对视频处理任务的处理方法及其系统
CN111290868A (zh) 任务处理方法、装置和系统以及流程引擎
CN113094155A (zh) Hadoop平台下的任务调度方法及装置
CN110134502A (zh) 任务处理方法、装置、系统、计算机设备和存储介质
US11892972B2 (en) Synchronization mechanisms for a multi-core processor using wait commands having either a blocking or a non-blocking state
CN111476663B (zh) 一种数据处理方法、装置、节点设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 12/06/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21857585

Country of ref document: EP

Kind code of ref document: A1