WO2020093306A1 - 神经网络层分组方法、装置、设备、存储介质及程序产品 - Google Patents

神经网络层分组方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2020093306A1
WO2020093306A1 PCT/CN2018/114549 CN2018114549W WO2020093306A1 WO 2020093306 A1 WO2020093306 A1 WO 2020093306A1 CN 2018114549 W CN2018114549 W CN 2018114549W WO 2020093306 A1 WO2020093306 A1 WO 2020093306A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
data
grouping
layer
packet
Prior art date
Application number
PCT/CN2018/114549
Other languages
English (en)
French (fr)
Inventor
蒋国跃
Original Assignee
北京比特大陆科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京比特大陆科技有限公司 filed Critical 北京比特大陆科技有限公司
Priority to CN201880098346.6A priority Critical patent/CN112955906B/zh
Priority to PCT/CN2018/114549 priority patent/WO2020093306A1/zh
Publication of WO2020093306A1 publication Critical patent/WO2020093306A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the field of neural networks, for example, to a neural network layer grouping method, device, equipment, storage medium, and program product.
  • the main core of deep learning is neural networks, and in order to achieve high accuracy of image recognition and speech recognition
  • the number of layers of neural networks is getting deeper and deeper, which also requires more and more computing power.
  • neural network processors or AI chips
  • one type of neural network processors uses local storage that can be managed by software, and the calculation of neural networks is deployed through software.
  • the layer is calculated in local storage to achieve high performance.
  • research developers often group and merge the layers of the neural network.
  • a first aspect of an embodiment of the present disclosure provides a layer grouping method of a neural network, including:
  • an invalid group is determined according to a preset validity rule
  • an invalid grouping is determined according to a preset validity rule, and the step of performing a second division on the invalid grouping according to the second grouping rule is continued;
  • Each effective group set is scored according to a preset rule, and a target group set is determined in the effective group set according to the score.
  • a second aspect of an embodiment of the present disclosure provides a layer grouping device of a neural network, including:
  • the first grouping module is used to group the layers of the neural network according to the first grouping rule to obtain multiple first groups;
  • a screening module configured to determine an invalid group in the first group according to a preset validity rule
  • a second grouping module configured to divide the invalid grouping twice according to the second grouping rule to obtain a second grouping
  • the screening module is further configured to determine an invalid packet according to a preset validity rule in the second packet, and the second packet module continues to perform a second division of the invalid packet according to the second packet rule step;
  • the set determination module is configured to determine a plurality of effective group sets according to the first group and the second group;
  • the target set determination module is configured to score each of the effective grouping sets according to a preset rule, and determine the target grouping set in the effective grouping set according to the score.
  • a third aspect of an embodiment of the present disclosure provides a computer that includes the layer grouping device of the aforementioned neural network.
  • a fourth aspect of an embodiment of the present disclosure provides a computer-readable storage medium that stores computer-executable instructions that are configured to perform the layer grouping method of the neural network described above.
  • a fifth aspect of an embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a computer program stored on a computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer To make the computer execute the layer grouping method of the neural network.
  • a sixth aspect of the embodiments of the present disclosure provides an electronic device, including:
  • At least one processor At least one processor
  • a memory communicatively connected to the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor. When the instructions are executed by the at least one processor, the at least one processor is caused to perform the layer grouping method of the neural network.
  • the neural network layer grouping method, device, equipment, storage medium, and program product provided by the embodiments of the present disclosure include grouping the neural network layers according to the first grouping rule to obtain multiple first groups; in the first group, according to Invalid grouping is determined by the preset validity rule; the invalid grouping is divided twice according to the second grouping rule to obtain the second grouping; in the second grouping, the invalid grouping is determined according to the preset validity rule, and the execution continues according to The two-grouping rule divides the invalid group into two steps; according to the first group and the second group, a variety of effective group sets are determined; each effective group set is scored according to the preset rules, and the effective group set is scored according to the score Determine the target grouping set.
  • the solution provided by the embodiment of the present disclosure can group the layers in the neural network based on a preset grouping rule, and verify the obtained grouping according to the validity rule, and perform a secondary division on the invalid grouping obtained by the test to obtain Multiple effective grouping sets; then, in the effective grouping set, a target grouping set that can maximize the performance of the device is determined. Therefore, the solution provided in this embodiment can improve the grouping efficiency, and compared to manual grouping, it can be obtained More grouping methods to determine the optimal target grouping set.
  • FIG. 1 is a flowchart of a layer grouping method of a neural network according to an exemplary embodiment of the present invention
  • FIG. 2 is a flowchart of a layer grouping method of a neural network according to another exemplary embodiment of the present invention.
  • 2A is a flowchart illustrating determining an invalid packet according to an exemplary embodiment of the present invention
  • 2B is a flow chart showing whether the memory occupied by a packet during operation is greater than the memory of a processor running a neural network according to an exemplary embodiment of the present invention
  • 2C is a flowchart illustrating scoring of each effective grouping set according to a preset rule according to an exemplary embodiment of the present invention
  • FIG. 3 is a structural diagram of a layer grouping device of a neural network according to an exemplary embodiment of the present invention.
  • FIG. 4 is a structural diagram of a layer grouping device of a neural network according to another exemplary embodiment of the present invention.
  • Fig. 5 is a structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • you can The layers in the neural network are grouped, and the neural network is processed in groups.
  • the layers are grouped only according to the manually specified method, it will cause the problem of low grouping efficiency, and in this grouping method, the fusion rate of each layer in the neural network is low, which cannot effectively improve the performance of the device.
  • the scheme provided by the present disclosure is provided with grouping rules and validity rules, which can group the layers of the neural network based on the grouping rules, and verify the obtained groupings based on the validity rules, which can improve the grouping efficiency, and can be based on the grouping rules.
  • the neural network performs multiple groupings to obtain multiple grouping results, and then determines the grouping method that effectively improves the performance of the device.
  • FIG. 1 is a flowchart of a layer grouping method of a neural network according to an exemplary embodiment of the present invention.
  • the layer grouping method of the neural network provided in this embodiment includes:
  • Step 101 Group the layers of the neural network according to the first grouping rule to obtain multiple first groups.
  • the method provided in this embodiment may be executed by a device with a computing function, and may specifically be a device including a processor, such as a computer.
  • the layers included in the neural network can be sent to the device so that the device groups these layers.
  • a first grouping rule may be set in advance to perform preliminary grouping on the neural network and obtain multiple first groupings.
  • some layer types may be specified in advance, and these types of layers may be individually grouped into a group.
  • the specified layer type may be a fully connected layer.
  • the number of data in the fully connected layer is large.
  • the fully connected layer there is more data to be obtained. Therefore, each fully connected layer can be divided into a group.
  • Local storage refers to the storage unit of the processor used to execute the method provided in this embodiment in the device, and external storage refers to the storage unit external to the processor.
  • multiple independent processing units (cores) may be provided in one processor.
  • the first grouping rule can be set according to requirements, for example, it is also possible to directly divide consecutive N layers into a group, where N is any natural number.
  • the layers of the neural network are connected, specifically connected by inter-layer coefficients, and there is a data transmission direction between each layer.
  • the input data is input to the first layer layer1, layer1 processes the data and outputs the data, and processes the data according to the weight coefficient connecting the first layer and the second layer, and inputs the processing result to the second layer layer2, layer2 then input The data to this layer is processed and output.
  • layer1 and layer2 can be considered as two consecutive layers, and the data is transmitted from layer1 to layer2.
  • Step 102 In the first group, an invalid group is determined according to a preset validity rule.
  • a validity rule may be set, and the obtained first group may be checked according to the validity rule. If the group is invalid, it is re-divided, and if it is valid, the group may be retained.
  • each group obtained by grouping the layers of the neural network should be able to realize the original function of the neural network. If the grouping is invalid and the neural network cannot operate normally, it is meaningless to group the layers. Therefore, you can set validity rules in advance according to the requirements, which are used to verify the grouping results.
  • the data generated by layer1 is used by layer2, the data flows from layer1 to layer2.
  • the data flows from layer1 to layer2.
  • it can also be used as a condition for judging the validity of the packet whether there will be a closed loop in the data flow.
  • the validity rules can be set according to the specific conditions of the middle layer of the neural network. For example, if the amount of data processed by the neural network is large, you can cut the data, and then process each part of the data after cutting. If the memory occupied by the group when running based on the divided data is less than or equal to the processor memory, the group can be considered valid. It should be noted that the divided data must be able to ensure the normal operation of the layers within the packet.
  • the rule for judging whether a packet is valid may include multiple conditions. When the packet satisfies the above multiple conditions at the same time, the packet may be considered valid.
  • the specific conditions may be set according to requirements.
  • step 103 the invalid packet is divided twice according to the second grouping rule to obtain a second packet.
  • the invalid packet can be divided again, and a second packet can be obtained.
  • a second grouping rule may be set in advance for secondary grouping of invalid groups.
  • the second grouping rule may include a part of the rules in the first grouping rule, that is, the first grouping rule and the second grouping rule may have intersections, and of course, the two may be completely different.
  • multiple grouping possibilities can be obtained. For example, a dichotomy can be used to split the invalid packet, so that one invalid packet is split into two second packets. How to split it may include multiple possibilities. For example, when the number of layers in the invalid group is even, it can be split into two second groups with the same number of layers.
  • the number of layers in the invalid group is odd, then The number of layers of the two second groups may be different by 1, in this case, the first second group may be 1 more than the second second group, or the first second group may be larger than the second
  • the number of layers of the second group is smaller by 1; invalid packets can also be divided according to other allocation ratios, for example, the layer ratio of the two second groups is 2: 8, 4: 6, etc. It should be noted that because the layer cannot be further Segmentation, therefore, the ratio of the number of layers is only approximately equal to the ratio of layers. For example, if the invalid packet includes 11 layers in total, when splitting according to 2: 8, the number of layers in one second packet can be 3 layers, and that in the other second packet The number of layers can be 8 layers.
  • Step 104 In the second group, an invalid group is determined according to a preset validity rule.
  • each second packet may be checked to determine the invalid packet included therein, and the second division step of step 103 is continued. After several cycles, the invalid packet can be split into multiple valid packets.
  • Step 105 Determine a plurality of effective group sets according to the first group and the second group.
  • the second grouping rule may include multiple segmentation methods. For each invalid grouping, there may be multiple segmentation results. Each segmentation result is combined with other valid groupings and other invalid groupings.
  • To get a grouped collection For example, when the neural network is grouped for the first time, the seven first groups of AG are obtained, and the three groups of B, C, and D are invalid groups, then the three groups are divided twice. For invalid groupings, there are n segmentation results, so a total of n ⁇ n ⁇ n grouping sets can be obtained. If the three groups B, C, and D are divided twice, and the obtained group also includes an invalid group, then the invalid group can be divided again to obtain more group sets.
  • each layer is indispensable, therefore, the grouping set includes all layers in the neural network.
  • the 7 first groups A-G are a grouping set.
  • the effective group set refers to a group set in which all groups are valid, and multiple effective group sets can be determined according to the first group and the second group.
  • the effective first group and the second group can be combined to obtain an effective group set.
  • B is divided twice to obtain two effective second groupings, B 1 and B 2
  • B is divided twice to obtain two effective B 3 , B 4 Second group
  • C can be divided into two effective second groups of C 1 and C 2
  • C can also be divided into two effective second groups of C 3 and C 4
  • D can be divided into two effective second groups of D 1 and D 2
  • D can also be divided into two effective second groups of D 3 and D 4 .
  • the effective grouping set can be composed of groups A, B 1 , B 2 , C 1 , C 2 , D 1 , D 2 , E, F, G, and can also be composed of groups A, B 3 , B 4 , C 1 , C 2 , D 1 , D 2 , E, F, G composition.
  • the grouping of the layers of the neural network is to improve the performance of the device when processing the neural network.
  • the data flow relationship of each layer in the neural network should not be changed. Therefore, the data flow direction of each layer in the neural network is unchanged, but the layer is carried out Group to quickly process them. Before grouping, there is an association relationship between the layers of the neural network. After the grouping, the association relationship remains unchanged.
  • Step 106 Score each effective group set according to a preset rule, and determine the target group set in the effective group set according to the score
  • a rule for scoring effective groupings may be set to determine an optimal target grouping among multiple effective groupings.
  • each of the packets can normally run in the processor, so these effective packet sets are all optional.
  • the extent to which each effective group set can improve the performance of the device can be determined according to a preset rule, and the effective group set that maximizes the performance of the device can be determined as the target group set.
  • the speed of the processor can be used as a condition for device performance. Therefore, the time required for the processor to run each effective group set can be determined, and the shortest time effective group set can be determined as the optimal set, that is, the target group set.
  • an effective grouping set can be composed of groups A, B 1 , B 2 , C 1 , C 2 , D 1 , D 2 , E, F, and G.
  • the processor runs group A, it needs to move A from external storage
  • the required data will consume transportation time at this time.
  • running group B 1 it will also consume transportation time.
  • the transportation time spent by running each group will be superimposed to obtain the total transportation time, which can effectively minimize the total transportation time.
  • the grouping set is determined as the target grouping set.
  • the data needs to be segmented, and the data corresponding to different groups needs to be segmented. Therefore, the grouping is different, and the results of the data segmentation are also different.
  • the time taken to calculate the data segmentation result may also be considered, and the effective group set with the shortest sum of the transportation time and the time to calculate the segmentation result may be determined as the target group set.
  • the method provided in this embodiment is used to group the layers of the neural network.
  • the method is performed by a device provided with the method provided in this embodiment, and the device is usually implemented in hardware and / or software.
  • the layer grouping method of the neural network includes: grouping the layers of the neural network according to the first grouping rule to obtain a plurality of first groupings; in the first grouping, an invalid grouping is determined according to a preset validity rule ; Secondly divide the invalid grouping according to the second grouping rule to get the second group; in the second grouping, determine the invalid grouping according to the preset validity rule, and continue to perform the second grouping of the invalid grouping according to the second grouping rule Steps of segmentation; multiple effective group sets are determined according to the first group and the second group; each effective group group is scored according to a preset rule, and the target group group is determined in the effective group group according to the score.
  • the method provided in this embodiment can group the layers in the neural network based on a preset grouping rule, and verify the obtained grouping according to the validity rule, and divide the invalid grouping obtained by the test into two times to obtain more Effective grouping sets; and then determine the target grouping set that can maximize the performance of the device in the effective grouping set, therefore, the method provided in this embodiment can improve the grouping efficiency, and can be more effective than manual grouping Multiple grouping methods can determine the optimal target grouping set.
  • Fig. 2 is a flowchart of a layer grouping method of a neural network according to another exemplary embodiment of the present invention.
  • the layer grouping method of the neural network provided in this embodiment includes:
  • Step 201 Traverse the layers of the neural network, determine target layers that belong to a preset type, and determine each target layer as one of the first groupings.
  • some types of layers may be pre-determined, among the layers of the neural network, layers belonging to the predetermined type are found, and the layer that meets the requirements is determined as the target layer.
  • the preset type of layer may be a layer with a large amount of data, such as a fully connected layer.
  • the processor memory unit may be able to store the data of the layer, or after the data of the layer is stored, the remaining capacity cannot store all the data of other layers. Therefore, the data volume is directly large
  • the layers can be grouped together, there is no need to group such layers with other layers. Assuming that the layer with large amount of data and other layers are divided into a group, it is easy to cause the grouping to be invalid, and it needs to be divided again, and it may even be divided multiple times. The final result is to separate the group with large amount of data Divided into a group. Therefore, directly dividing layers with a large amount of data into a group can improve the grouping speed and improve the grouping efficiency.
  • the neural network includes 7 ag
  • two target layers b and e are determined, then layer b can be used as a first group, and layer e can be used as a target group.
  • Step 202 Group the layers other than the target layer according to the order of the first group including the target layer and the layer of the neural network to obtain other first groups.
  • the target layer can be used as a boundary, and other layers between the boundaries can be grouped together.
  • all layers before the first target layer in the neural network can be divided into a first group; all layers between every two adjacent target layers can be divided into a first group; and after the last target layer All layers are divided into a first group.
  • group all the layers before the target layer b into a group that is, layer a into a group
  • group the layers between the target layer b and the target layer e into a group, that is, layer c and d into a group All layers after the target layer e are grouped, that is, layers f and g are grouped together.
  • the front-to-back relationship between layers can be determined according to the relationship between data production and consumption, with the production data layer first and the consumption data layer second. For example, layer a processes the input data to produce new data, and layer b performs the next processing based on the new data produced by layer a. Layer a comes first, and layer b comes after.
  • all layers in the neural network may be preliminarily grouped to obtain multiple first groups, and some of the first groups include only one layer of a preset type.
  • Step 203 In the first group, an invalid group is determined according to a preset validity rule.
  • Fig. 2A is a flowchart of determining an invalid packet according to an exemplary embodiment of the present invention.
  • determining the invalid group according to the preset validity rule may specifically include:
  • step 2031 it is determined whether the direction of data flow between layers in the packet is closed loop.
  • the invalid packet when the invalid packet is determined in the first packet, it may be determined whether the direction of data flow between layers in the first packet is a closed loop.
  • the layers in the neural network will receive external data or data produced by other layers. Therefore, according to the data production and consumption process, the data flow direction between the layers can be determined, such as layer1 production data, layer2 consumption layer1 production data , The data stream is forwarded from layer1 to layer2.
  • the first packet may include multiple layers. At this time, it can be determined whether the direction of data flow between the layers in the packet is a closed loop. If there is a closed loop, it will cause the group to continue to run cyclically, resulting in the problem that the neural network cannot operate normally.
  • the data flow direction between layers is closed loop, which means that the data flow direction between layers is always in the group.
  • the group includes four layers, layer1-4, and the data flow direction is from layer1 to layer2, from layer2 to layer3, From layer3 to layer4, from layer4 to layer1.
  • step 2032 it is determined whether the memory occupied by the group operation is greater than the memory of the processor running the neural network.
  • the adjacent layers that do not belong to the target layer are grouped into a group.
  • the processor cannot process these layers due to too many layers in the group. For example, when running a layer in a group, you need to first move the data required for the inner layer of the group from external storage to internal storage. If the number of layers in the group causes the memory occupied by the data of these layers to exceed the processor memory, it will The processor cannot run this group normally. Therefore, it can be used as a condition for judging whether the group is valid according to whether the processor memory occupied by the group during operation is larger than the processor memory.
  • the processor here refers to a processor in a device running a neural network.
  • steps 2031 and 2032 are not limited.
  • Step 2033 if the data flow direction between the inner layers of the packet is a closed loop, or the memory occupied when the packet is running is greater than or equal to the processor memory, it is determined that the packet is an invalid packet.
  • the neural network cannot operate normally. Therefore, if the packet meets any of these two conditions, it can be determined to be an invalid packet.
  • the neural network processes the data
  • the data can be divided, and then processed based on the divided data. It may be that for a group, all the data it needs to be transferred to the processor memory will cause the problem that the processor cannot run the group normally, but if the data is divided, the processor can run the normal operation based on the divided data
  • the layer included in the packet then it can be considered that the inner layer of the packet is smaller than the processor memory.
  • FIG. 2B is a flow chart showing whether the memory occupied by a packet during operation is greater than the memory of a processor running a neural network according to an exemplary embodiment of the present invention.
  • the data involved in each group can be divided, and according to the divided data, the occupation of the group during operation can be determined Whether the memory is larger than the memory of the processor running the neural network.
  • This can include:
  • a segmentation rule for segmenting data can be set in advance.
  • the rule can include multiple segmentation methods. These methods can divide the packet data into data of different sizes.
  • the first segmentation method can be the first The dimension divides the grouped data.
  • the second division method is based on the obtained divided data, and then divides in the second dimension.
  • four parts of divided data can be obtained.
  • a grouped data is 4-dimensional data, and the four dimensions are batch size, channel, height, and width.
  • the first division method can divide the grouped data into two parts on the batch size.
  • the resulting two-dimensional data dimension It is 1/2 batch size, channel, height, width.
  • the packet data may be divided for the first time, for example, the packet data is divided into two parts on the batch size, and the step b is performed according to the obtained divided data.
  • step c is performed.
  • the group is determined to be invalid.
  • the data is divided to improve the speed of the device when the layer included in the group is run faster.
  • the divided data must be able to support the normal operation of the layer in the group. Therefore, it can be judged whether the divided data can support the calculation of the layers in the packet, and if not, it can be determined that the packet is an invalid packet. If the split data cannot support inter-layer operations without splitting the packet data, and the processor cannot run the packet normally, the packet may be considered invalid.
  • step d is performed.
  • step c can be performed to determine that the packet is an invalid packet.
  • each layer in the group can be deduced based on the split data to determine the input data format and output data format of each layer.
  • the input data format and output data format of each layer of each group are determined according to the divided data of each group, and the format may be the dimensional size of the data, such as batch size slice and height slice. Then according to the output data format of the packet, determine the output data and input data format of each layer in the packet. During the derivation process, if the slice calculation results of the same data are inconsistent, that is, the reverse inference fails, the packet is considered invalid .
  • the order of each layer of the grouping can be determined according to the layer order of the neural network; the combination of the generated data layer and the consumer data layer is determined according to the order of each layer Then, according to the output data format of the consumer data layer, determine the input data format of the consumer data layer, and determine the output data format of the generated data layer according to the input data format of the consumer data layer.
  • packet A includes four layers a, b, c1, c2, and d
  • the data flow direction is a, b, c1, and d
  • a, b, c2, and d then a and b, b and c1, and b And c2, c1 and d, c2 and d are all combined pairs.
  • the output data format of d the output data formats of c1 and c2 can be derived respectively, and then the first output data format of b can be derived from c1, and the second output data format of b can be derived from c2.
  • the height of the input data can be calculated based on kernel height and stride.
  • layers such as LRN and Batchnorm
  • the input data The height of the slice is equal to the height of the output data.
  • a data includes multiple derivation formats. If the format corresponding to a data is different, the packet is determined to be an invalid packet. For example, the two output data formats of layer b are derived. If the first output data format is different from the second output data format, it can be considered that the current grouping fails. It should be noted that the derivation fails only when there are branches in the neural network.
  • step f is performed.
  • the processor When the processor actually runs the neural network, it may run different layers at the same time, or even run different groups at the same time. Therefore, it is also necessary to consider whether the processor can allocate free storage locations to the group during actual operation.
  • the time when the device runs the neural network and other programs can be set in advance, and the time when each layer in the neural network generates data to be stored and the time that the data is consumed can be determined according to the preset time schedule. Then, the time period between them is determined as the time period in which the data to be stored takes up space in the processor memory.
  • the time step at which each layer in the group generates data to be stored and consumes the data to be stored can be determined according to a preset time schedule; the time period during which the data to be stored occupies space in the processor memory is determined according to the time step. For example, at time t 0 layer d generates data n to be stored, and at t 1 layer e consumes the data n to be stored, then the period from t 0 to t 1 is the time period during which data to be stored n takes up the processor.
  • the processor runs a packet, it can store the data generated by the middle layer in the local storage, and then delete the data after it is consumed, so that there is no need to store the data generated by each layer in the external storage.
  • the data is transferred from external storage.
  • the method provided in this embodiment can directly store data generated between layers in internal storage.
  • the processor may process multiple programs, or process multiple layers of the neural network in parallel.
  • the processor may have been allocated other stored data For example, during the period from time t 0 to time t 1 , the processor needs to store other data m, p, q, and so on.
  • the processor For data stored in the processor, it can have two dimensions of time and space. When different stored data overlap at the same time in these two dimensions, there will be conflicts, that is, it is impossible to store different data in storage at the same time. In the same bank (block) of the unit. Therefore, the occupation location of the conflict data in the processor memory can be obtained, and the allocable space can be determined in the processor memory according to the occupation location. Specifically, the unoccupied location in the processor memory can be determined according to the occupation location, and then the processor In memory, unoccupied locations are used as allocable space.
  • the first sub-block occupied by the input data for carrying the group during the group operation can be determined according to the operation rules of the processor. Then exclude the first partition from the allocable space to obtain the remaining space, and then determine whether to allocate storage space for the data to be stored according to the remaining space. For example, it can be determined whether there are consecutive blocks in the remaining space that can store the data to be stored. If not, it can be considered that space cannot be allocated for the data to be stored.
  • step h if it is determined that storage space can be allocated for the data to be stored, step i is performed.
  • a group may include multiple layers, and each layer may generate multiple data to be stored. If each layer and each data in the group meet the above conditions, it can be determined that the group is a valid group.
  • step a If it is determined according to the allocable space that the storage space cannot be allocated for the data to be stored, then continue to perform the step of dividing the data according to the preset rule in step a.
  • step c is entered.
  • the packet is considered invalid; if the split data can support the normal operation of the layers in the packet, and the data generated during the operation can be successfully allocated to the memory of the processor, then this situation will go to step i and the packet is considered valid. Therefore, based on the packet obtained in this embodiment, the processor can normally operate various layers within the packet.
  • the grouping For the valid grouping in the first grouping, the grouping can be retained without secondary segmentation, and for the invalid groupings in the first grouping, it can be segmented twice based on the second grouping rule.
  • step 204 the invalid grouping is divided twice according to the second grouping rule to obtain a second grouping.
  • step 204 and step 103 are similar, and will not be repeated here.
  • Step 205 In the second group, an invalid group is determined according to a preset validity rule.
  • step 205 and step 104 are similar, and will not be repeated here.
  • the preset validity rule adopted therein may be the same as the rule adopted in step 203, for example, the manner of determining invalid packets shown in FIG. 2A.
  • Step 206 Determine a plurality of effective group sets according to the first group and the second group.
  • step 206 and step 105 are similar.
  • the effective grouping set includes multiple effective groups, and the effective grouping set includes all layers of the neural network, and the layers are not repeated.
  • Step 207 Score each valid grouping set according to a preset rule.
  • Fig. 2C is a flow chart of scoring each valid grouping set according to a preset rule according to an exemplary embodiment of the present invention.
  • the method provided in this embodiment may include the step of segmenting the data. At this time, scoring each valid grouping set according to a preset rule may include:
  • Step 2071 Determine the calculation time for calculating the packet data corresponding to the effective packet set.
  • the calculation time may include the extra time spent by the processor cutting the packet data. It is possible to determine the extra calculation time spent in cutting the packet data corresponding to each group in an effective group set, and then add the extra calculation time corresponding to each group in the effective group set to obtain the corresponding Calculation time.
  • the calculation time may also include the processing time of the data when the processor runs each layer, but for the same device, the calculation time it takes to run the same layer based on the same data should be the same. Therefore, this The time spent in a calculation process can also be considered.
  • Step 2072 Determine the transport time taken by each group to transport the input data of the group in the valid group set.
  • the data needed for the group needs to be transferred from external storage to internal storage, so that during the operation of this group, there is no need to store and transfer data to the outside, which can improve the performance of the device.
  • this embodiment groups the layers to reduce the number of data movements as much as possible, it is still inevitable that the data needs to be moved. Therefore, it is also possible to determine the effective data collection time of each group in the effective grouping group, and then correspond The transport time of the stack is added to obtain the transport time corresponding to the effective group set.
  • the handling data refers to the data necessary for handling the various layers in the operation group. These data may be pre-stored in external storage, or it may be generated by the layer in the group during operation and stored in external storage. When other layers are running, they may rely on the data, so the data needs to be moved to the internal storage.
  • steps 2071 and 2072 is not limited.
  • Step 2073 Determine the score of the effective grouping set according to the calculation time and the transportation time.
  • the calculation time and handling time of each group in the effective group set can be simply superimposed to obtain the additional time spent by each effective group set during the operation, and the additional time is determined as the score of the effective group set.
  • step 208 the target group set is determined in the effective group set according to the score.
  • step 207 the score is determined according to the calculation time and the transportation time of each group in the effective group set, and the sum of the calculation time and the transportation time is determined as the score.
  • step 208 may further include: determining the effective group set with the lowest score as the target group set.
  • the sum of the above calculation time and transport time may be simply added, or the weighted sum may be calculated by setting a weight value, etc. This embodiment does not limit this.
  • the processing speed of the device can be used as an indicator of device performance. Therefore, the effective grouping set that takes the shortest additional time is determined as the target grouping set. As mentioned above, the effective grouping with the shortest additional time is the effective grouping with the lowest score.
  • FIG. 3 is a structural diagram of a layer grouping device of a neural network according to an exemplary embodiment of the present invention.
  • the layer grouping device of the neural network provided in this embodiment includes:
  • the first grouping module 31 is used to group the layers of the neural network according to the first grouping rule to obtain multiple first groups;
  • the filtering module 32 is configured to determine an invalid group in the first group according to a preset validity rule
  • the second grouping module 33 is configured to divide the invalid group twice according to the second grouping rule to obtain a second group
  • the screening module 32 is further configured to determine an invalid packet according to a preset validity rule in the second packet, and the second packet module 33 continues to perform the invalid packet twice according to the second packet rule Steps of segmentation;
  • the set determining module 34 is configured to determine a plurality of effective group sets according to the first group and the second group;
  • the target set determination module 35 is configured to score each effective group set according to a preset rule, and determine a target group set in the effective group set according to the score.
  • the layer grouping device of the neural network includes a first grouping module for grouping layers of the neural network according to the first grouping rule to obtain a plurality of first groupings; a screening module is used in the first grouping , The invalid grouping is determined according to the preset validity rule; the second grouping module is used to divide the invalid grouping according to the second grouping rule to obtain the second grouping; the filtering module is also used in the second grouping Set the validity rule to determine invalid packets, and the second grouping module continues to perform the step of dividing the invalid packets twice according to the second grouping rule; the set determination module is used to determine a variety of valid groups based on the first grouping and the second grouping Grouping set; The target set determining module is used to score each effective grouping according to a preset rule, and determine the target grouping in the effective grouping according to the score.
  • the device provided in this embodiment can group the layers in the neural network based on the preset grouping rules, and verify the obtained groupings according to the validity rules, and divide the invalid groupings obtained by the test twice, thereby obtaining more An effective grouping set; then the target grouping set that can maximize the performance of the device is determined in the effective grouping set, therefore, the device provided in this embodiment can improve the grouping efficiency, and can be more effective than manual grouping Multiple grouping methods can determine the optimal target grouping set.
  • FIG. 4 is a structural diagram of a layer grouping device of a neural network according to another exemplary embodiment of the present invention.
  • the first grouping module 31 is specifically configured to:
  • the first group including the target layer and the layer of the neural network, group the layers other than the target layer to obtain the other first group.
  • the first grouping module 31 is specifically used to:
  • the screening module 32 is specifically used to:
  • the packet is determined to be the invalid packet.
  • the screening module 32 is specifically used to:
  • the filtering module 32 is also used to:
  • the screening module 32 is specifically used to:
  • the layer order of the neural network determine the order of each layer of the grouping
  • the input data format of the consumer data layer is determined according to the output data format of the consumer data layer, and the output data format of the generated data layer is determined according to the input data format of the consumer data layer.
  • the screening module 32 is specifically used to:
  • the filtering module 32 is further used to:
  • the screening module 32 continues to perform the step of dividing the grouped data according to a preset rule to obtain divided data.
  • the screening module 32 is specifically used to:
  • the time period during which the data to be stored occupies space in the processor memory is determined according to the time step.
  • the screening module 32 is used to:
  • the screening module 32 is specifically used to:
  • the apparatus provided in this embodiment further includes an effective group determination module 36, which is configured to determine the effective group determination module 36 if the screening module 32 determines that storage space can be allocated for the data to be stored The grouping is a valid grouping.
  • the effective grouping set includes a plurality of effective groupings, and the effective grouping set includes all layers of the neural network, and the layers are not repeated.
  • the target set determination module 35 is specifically used to:
  • the score of the effective group set is determined according to the calculation time and the transportation time.
  • the target set determination module 35 specifically:
  • the effective grouping set with the lowest score is determined as the target grouping set.
  • An embodiment of the present disclosure also provides a computer including the layer grouping device device of the neural network.
  • An embodiment of the present disclosure also provides a computer-readable storage medium that stores computer-executable instructions that are configured to perform the layering method of the neural network described above.
  • An embodiment of the present disclosure also provides a computer program product.
  • the computer program product includes a computer program stored on a computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer, the The computer executes the layer grouping method of the neural network described above.
  • the aforementioned computer-readable storage medium may be a transient computer-readable storage medium or a non-transitory computer-readable storage medium.
  • Fig. 5 is a structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • the electronic device provided in this embodiment includes:
  • At least one processor (processor) 50 one processor 50 is taken as an example in FIG. 5; and memory
  • the (memory) 51 may also include a communication interface (Communication) 52 and a bus 53. Among them, the processor 50, the communication interface 52, and the memory 51 can complete communication with each other through the bus 53.
  • the communication interface 52 can be used for information transmission.
  • the processor 50 may call logic instructions in the memory 51 to execute the layer grouping method of the neural network of the above embodiment.
  • logic instructions in the memory 51 described above can be implemented in the form of software functional units and sold or used as independent products, and can be stored in a computer-readable storage medium.
  • the memory 51 is a computer-readable storage medium and can be used to store software programs and computer-executable programs, such as program instructions / modules corresponding to the methods in the embodiments of the present disclosure.
  • the processor 50 executes functional applications and data processing by running software programs, instructions, and modules stored in the memory 51, that is, implementing the layer grouping method of the neural network in the above method embodiment.
  • the memory 51 may include a storage program area and a storage data area, wherein the storage program area may store an operating system and application programs required for at least one function; the storage data area may store data created according to the use of a terminal device and the like.
  • the memory 51 may include a high-speed random access memory, and may also include a non-volatile memory.
  • the technical solutions of the embodiments of the present disclosure may be embodied in the form of software products, which are stored in a storage medium and include one or more instructions to make a computer device (which may be a personal computer, server, or network) Equipment, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure.
  • the aforementioned storage medium may be a non-transitory storage medium, including: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • a medium that can store program codes may also be a transient storage medium.
  • first, second, etc. may be used in this application to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • the first element can be called the second element, and likewise, the second element can be called the first element, as long as all occurrences of the "first element” are consistently renamed and all occurrences of The “second component” can be renamed consistently.
  • the first element and the second element are both elements, but they may not be the same element.
  • the various aspects, implementations, implementations or features in the described embodiments can be used alone or in any combination.
  • Various aspects in the described embodiments may be implemented by software, hardware, or a combination of software and hardware.
  • the described embodiments may also be embodied by a computer-readable medium that stores computer-readable code including instructions executable by at least one computing device.
  • the computer-readable medium can be associated with any data storage device capable of storing data, which can be read by a computer system.
  • Computer-readable media used for examples may include read-only memory, random access memory, CD-ROM, HDD, DVD, magnetic tape, optical data storage devices, and the like.
  • the computer-readable medium may also be distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络层分组方法、装置、设备、存储介质及程序产品。方法包括根据第一分组规则对神经网络的层进行分组,得到多个第一分组(101);在第一分组中,根据预设有效性规则确定出无效分组(102);根据第二分组规则对无效分组进行二次分割,得到第二分组(103);在第二分组中,根据预设有效性规则确定出无效分组,并继续执行根据第二分组规则对无效分组进行二次分割的步骤(104);根据第一分组、第二分组确定出多种有效分组集合(105);根据预设规则对每个有效分组集合进行评分,并根据评分在有效分组集合中确定出目标分组集合(106)。该方案能够提高分组效率,且相较于人工手动分组来说,能够得到更多的分组方式,从而能够确定出最优的目标分组集合。

Description

神经网络层分组方法、装置、设备、存储介质及程序产品 技术领域
本申请涉及神经网络领域,例如涉及一种神经网络层分组方法、装置、设备、存储介质及程序产品。
背景技术
近年来,深度学习在图像识别、语音识别等所取得的成就使得人工智能成为了当下最热的领域,深度学习中的主要核心则是神经网络,而为了达到很高的图像识别、语音识别精度,神经网络的层(layer)数越来越深,这也对算力的要求也越来也大。
为了适应神经网络高计算量的需求,各种神经网络处理器(或者称为AI芯片)被提出,而其中一类神经网络处理器采用了可软件管理的本地存储,通过软件调配神经网络的计算层在本地存储中计算,以达到高性能。而为了尽可能使得神经网络的layer都放置在本地存储计算,避免高开销的全局存储访问,研究开发者们往往会对神经网络的layer进行分组融合。
现有的分组融合方案往往需要人为的指定可进行融合的layer组合类型,通过在网络中搜索是否存在这些组合,然后才能对可进行融合的layer进行合并。这样的技术方案虽然可以获得一定的效果,但人为的指定可融合的layer,效率较低;且神经网络的融合率较低,仍然存在大量地layer数据需要放置到全局存储中,神经网络处理器的性能开发率不高。
上述背景技术内容仅用于帮助理解本申请,而并不代表承认或认可所提及的任何内容属于相对于本申请的公知常识的一部分。
发明内容
本公开实施例第一方面提供了一种神经网络的层分组方法,包括:
根据第一分组规则对神经网络的层进行分组,得到多个第一分组;
在所述第一分组中,根据预设有效性规则确定出无效分组;
根据第二分组规则对所述无效分组进行二次分割,得到第二分组;
在所述第二分组中,根据预设有效性规则确定出无效分组,并继续执行根据第二分组规则对所述无效分组进行二次分割的步骤;
根据所述第一分组、所述第二分组确定出多种有效分组集合;
根据预设规则对每个所述有效分组集合进行评分,并根据所述评分在所述有效分组集合中确定出目标分组集合。
本公开实施例第二方面提供了一种神经网络的层分组装置,包括:
第一分组模块,用于根据第一分组规则对神经网络的层进行分组,得到多个第一分组;
筛选模块,用于在所述第一分组中,根据预设有效性规则确定出无效分组;
第二分组模块,用于根据第二分组规则对所述无效分组进行二次分割,得到第二分组;
所述筛选模块还用于在所述第二分组中,根据预设有效性规则确定出无效分组,并且所述第二分组模块继续执行根据第二分组规则对所述无效分组进行二次分割的步骤;
集合确定模块,用于根据所述第一分组、所述第二分组确定出多种有效分组集合;
目标集合确定模块,用于根据预设规则对每个所述有效分组集合进行评分,并根据所述评分在所述有效分组集合中确定出目标分组集合。
本公开实施例第三方面提供了一种计算机,包含上述的神经网络的层分组装置。
本公开实施例第四方面提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述的神经网络的层分组方法。
本公开实施例第五方面提供了一种计算机程序产品,所述计算机程序产品 包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述的神经网络的层分组方法。
本公开实施例第六方面提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行上述的神经网络的层分组方法。
本公开实施例提供的神经网络层分组方法、装置、设备、存储介质及程序产品,包括根据第一分组规则对神经网络的层进行分组,得到多个第一分组;在第一分组中,根据预设有效性规则确定出无效分组;根据第二分组规则对无效分组进行二次分割,得到第二分组;在第二分组中,根据预设有效性规则确定出无效分组,并继续执行根据第二分组规则对无效分组进行二次分割的步骤;根据第一分组、第二分组确定出多种有效分组集合;根据预设规则对每个有效分组集合进行评分,并根据评分在有效分组集合中确定出目标分组集合。本公开实施例提供的方案,能够基于预设分组规则对神经网络中的层进行分组,并根据有效性规则对得到的分组进行校验,并对检验得到的无效分组进行二次分割,从而得到多个有效分组集合;再在有效分组集合中确定出能够最大限度提高设备性能的目标分组集合,因此,本实施例提供的方案能够提高分组效率,且相较于人工手动分组来说,能够得到更多的分组方式,从而能够确定出最优的目标分组集合。
附图说明
一个或多个实施例通过与之对应的附图进行示例性说明,这些示例性说明和附图并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,附图不构成比例限制,并且其中:
图1为本发明一示例性实施例示出的神经网络的层分组方法的流程图;
图2为本发明另一示例性实施例示出的神经网络的层分组方法的流程图;
图2A为本发明一示例性实施例示出的确定无效分组的流程图;
图2B为本发明一示例性实施例示出的分组运行时所占用的内存是否大于运行神经网络的处理器内存的流程图;
图2C为本发明一示例性实施例示出的根据预设规则对每个有效分组集合进行评分的流程图;
图3为本发明一示例性实施例示出的神经网络的层分组装置的结构图;
图4为本发明另一示例性实施例示出的一种神经网络的层分组装置的结构图;
图5为本发明一示例性实施例示出的电子设备的结构图。
具体实施方式
为了能够更加详尽地了解本公开实施例的特点与技术内容,下面结合附图对本公开实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本公开实施例。在以下的技术描述中,为方便解释起见,通过多个细节以提供对所披露实施例的充分理解。然而,在没有这些细节的情况下,一个或多个实施例仍然可以实施。在其它情况下,为简化附图,熟知的结构和装置可以简化展示。
图像识别、语音识别等技术的识别精度越高,其采用的神经网络的层数就越多,对处理设备的算力要求也就越高,为了提高设备在处理神经网络时的性能,可以对神经网络中的层进行分组,再以组为单位对神经网络进行处理。但是,若仅根据人工指定的方式对层进行分组,则会导致分组效率低的问题,且这种分组方式,神经网络中的各个层融合率低,无法有效的提高设备性能。
本公开提供的方案中,设置有分组规则以及有效性规则,能够基于分组规则对神经网络的层进行分组,基于有效性规则对得到的分组进行检验,能够提高分组效率,而且能够基于分组规则对神经网络进行多种方式的分组,得到多 种分组结果,从而再在其中确定出有效提高设备性能的分组方式。
图1为本发明一示例性实施例示出的神经网络的层分组方法的流程图。
如图1所示,本实施例提供的神经网络的层分组方法包括:
步骤101,根据第一分组规则对神经网络的层进行分组,得到多个第一分组。
其中,本实施例提供的方法可以由具有计算功能的设备执行,具体可以是包括处理器的设备,如计算机等。可以将神经网络包括的层发送到该设备中,以使设备对这些层进行分组。
具体的,可以预先设置第一分组规则,用于对神经网络进行初步分组,并得到多个第一分组。例如,可以预先指定一些层类型,将这些类型的层单独分为一组,例如指定层类型可以是全连接层。在神经网络中,全连接层的数据数量较大,对于全连接层来说,需要获取的数据也会较多,因此,可以将每个全连接层单独分为一组。在运行神经网络时,若一个组内仅包括全连接层,则可以将该全连接层所需的数据读取到本地存储中。本地存储是指设备中,用于执行本实施例提供的方法的处理器的存储单元,外部存储是指该处理器外部的存储单元。可选的,一个处理器中可以设置有多个独立的处理单元(core)。
另外,第一分组规则可以根据需求进行设置,例如,还可以直接将连续的N个层初步分为一组,N为任意自然数。神经网络的层是具有连接关系的,具体是通过层间系数进行连接,而且各个层之间是有数据传输方向的。例如,输入数据输入第一层layer1,layer1对数据进行处理输出数据,并根据连接第一层、第二层的权重系数对数据进行处理,将处理结果输入至第二层layer2,layer2再对输入至本层的数据进行处理并输出数据,此时,可以认为layer1与layer2是连续的2层,且数据从layer1传输至layer2。
步骤102,在第一分组中,根据预设有效性规则确定出无效分组。
进一步的,还可以设置有效性规则,并根据有效性规则对得到的第一分组进行检验,若分组无效,则重新划分,若有效,则可以保留该分组。
实际应用时,对神经网络的层进行分组后得到的各个组应该能够实现神经网络的原有功能,若分组无效,神经网络无法正常运行,那么对层进行分组也就没有意义。因此,可以根据需求预先设置有效性规则,用于对分组结果进行 校验。
其中,为了减少数据从外部存储搬运到内部存储的次数,基于分组结果运行神经网络时,可以将分组所需数据搬运到内部存储中,并基于这些数据在处理器中运行分组包括的层,因此,层分组在运行时所占的内存应小于等于处理器内存,否则,无法正常运行这个分组。因此,可以将层分组运行时所占内存是否大于处理器内存,作为判断分组有效性的一个条件。
具体的,由于层与层之间存在数据流转过程,例如layer1产生的数据被layer2使用,则数据从layer1流转到layer2。在运行一个分组时,分组内的层间数据流转不应出现闭环,否则,会造成循环运行该分组。因此,还可以将数据流转是否会出现闭环,作为判断分组有效性的一个条件。
进一步的,有效性规则可以根据神经网络中层的具体情况进行设置。例如,若神经网络处理的数据量较大,则可以对数据进行切割,然后分别对切割后的每部分数据进行处理。若分组在基于分割后的数据运行时所占内存小于或等于处理器内存,则可以认为该分组是有效的。需要说明的是,分割后的数据必须能够保证分组内的层正常运行。
实际应用时,判断分组是否有效的规则中可以包括多个条件,当分组同时满足上述多个条件时,可以认为该分组有效。当然,根据神经网络中包括的层的具体情况,有效规则中也可以仅设置一个条件,具体包括哪些条件可以根据需求设置。
步骤103,根据第二分组规则对无效分组进行二次分割,得到第二分组。
其中,若在第一分组中确定出了无效分组,则可以对无效分组进行再次分割,并得到第二分组。
具体的,可以预先设置第二分组规则,用于对无效分组进行二次分组。第二分组规则中可以包括第一分组规则中的一部分规则,即第一分组规则、第二分组规则可以有交叉部分,当然,二者也可以完全不同。
进一步的,根据第二分组规则对无效分组进行分割时,可以得到多种分组可能。例如,可以采用二分法对无效分组进行分割,使得一个无效分组拆分为两个第二分组。具体如何拆分则可以包括多种可能,比如无效分组内的层数量 为偶数时,可以将其拆分为两个层数相同的第二分组,若无效分组内的层数量为奇数时,则两个第二分组的层数量可以相差1,此时,可以是第一个第二分组比第二个第二分组的层数大1,也可以是第一个第二分组比第二个第二分组的层数小1;还可以按照其他分配比例对无效分组进行分割,例如两个第二分组的层数比为2:8、4:6等,需要说明的是,由于层不能进一步的分割,因此,层数比例只是约等于层数比,比如无效分组内共包括11层,则按照2:8进行分割时,一个第二分组的层数可以是3层,另一个第二分组的层数可以是8层。
步骤104,在第二分组中,根据预设有效性规则确定出无效分组。
实际应用时,在得到第二分组后,可以对各个第二分组进行校验,确定出其中包括的无效分组,并继续执行步骤103的二次分割步骤。经过数次循环,能够将无效分组拆分为多个有效分组。
步骤105,根据第一分组、第二分组确定出多种有效分组集合。
实际应用时,第二分组规则中可以包括多种分割方式,对于每个无效分组来说,就能够有多种分割结果,每种分割结果与其他有效分组、其他的无效分组的分割结果进行结合,能够得到分组集合。例如,对神经网络进行第一次分组时,得到A-G这7个第一分组,其中B、C、D三个分组为无效分组,则对这三个组进行二次分割,假如对于这三个无效分组来说,具有n种分割结果,那么共能够得到n×n×n种分组集合。若B、C、D三个分组进行二次分割时,得到的分组中还包括无效分组,则可以再对该无效分组进行分割,从而得到更多的分组集合。
对于神经网络来说,其中的各个层都是不可或缺的,因此,分组集合中包括神经网络中的所有层。例如,A-G这7个第一分组就是一个分组集合。有效分组集合是指,所有分组都有效的分组集合,可以根据第一分组、第二分组确定出多种有效分组集合。
其中,可以将有效的第一分组和第二分组进行组合,得到有效分组集合。例如以第一种分割方式对B进行二次分割得到B 1、B 2这两个有效的第二分组,以第二种分割方式对B进行二次分割得到B 3、B 4这两个有效的第二分组;相似的,可以将C分为C 1、C 2这两个有效的第二分组,还可以将C分为C 3、C 4这两 个有效的第二分组;相似的,可以将D分为D 1、D 2这两个有效的第二分组,还可以将D分为D 3、D 4这两个有效的第二分组。那么有效分组集合中可以由分组A、B 1、B 2、C 1、C 2、D 1、D 2、E、F、G组成,还可以由分组A、B 3、B 4、C 1、C 2、D 1、D 2、E、F、G组成。对神经网络的层进行分组,是为了提高设备在处理神经网络时的性能,不应改变神经网络中各个层数据流转关系,因此,神经网络中各个层的数据流转方向不变,只是将层进行分组,以便快速对其进行处理。分组前,神经网络的层间有关联关系,分组后,该关联关系不变。
步骤106,根据预设规则对每个有效分组集合进行评分,并根据评分在有效分组集合中确定出目标分组集合
具体的,还可以设置对有效分组集合进行评分的规则,从而在多个有效分组集合中确定出最优的目标分组集合。
进一步的,基于步骤105确定的有效分组集合中,其中的各个分组都能够正常在处理器中运行,因此,这些有效分组集合都是可选的。可以根据预设规则确定各个有效分组集合能够提高设备性能的程度,将提高设备性能程度最大的有效分组集合确定为目标分组集合。
实际应用时,处理器运行的快慢可以作为设备性能的一个条件,因此,可以确定处理器运行各个有效分组集合所需的时间,将时间最短的有效分组集合确定为最优的集合,即目标分组集合。
其中,对于同一个处理器来说,对各个层本身的处理过程耗费的时间可以认为是相同的,因此,可以考虑处理器在运行神经网络时从外部存储搬运数据所耗费时间,基于该搬运时间衡量各个有效分组集合的优劣。例如,有效分组集合中可以由分组A、B 1、B 2、C 1、C 2、D 1、D 2、E、F、G组成,那么处理器运行分组A时,需要从外部存储搬运A所需数据,此时会耗费搬运时间,在运行分组B 1时,也会耗费搬运时间,将运行各个分组所耗费的搬运时间进行叠加,能够得到总搬运时间,可以将总搬运时间最短的有效分组集合确定为目标分组集合。
具体的,若神经网络处理的数据量较大,需要对数据进行分割,需要对不 同分组对应的数据进行分割,因此分组不同,对数据的分割结果也不同。此时,还可以考虑计算数据分割结果所耗费的时间,将搬运时间与计算分割结果的时间之和最短的有效分组集合确定为目标分组集合。
本实施例提供的方法用于对神经网络的层进行分组,该方法由设置有本实施例提供的方法的设备执行,该设备通常以硬件和/或软件的方式来实现。
本实施例提供的神经网络的层分组方法,包括:根据第一分组规则对神经网络的层进行分组,得到多个第一分组;在第一分组中,根据预设有效性规则确定出无效分组;根据第二分组规则对无效分组进行二次分割,得到第二分组;在第二分组中,根据预设有效性规则确定出无效分组,并继续执行根据第二分组规则对无效分组进行二次分割的步骤;根据第一分组、第二分组确定出多种有效分组集合;根据预设规则对每个有效分组集合进行评分,并根据评分在有效分组集合中确定出目标分组集合。本实施例提供的方法,能够基于预设分组规则对神经网络中的层进行分组,并根据有效性规则对得到的分组进行校验,并对检验得到的无效分组进行二次分割,从而得到多个有效分组集合;再在有效分组集合中确定出能够最大限度提高设备性能的目标分组集合,因此,本实施例提供的方法能够提高分组效率,且相较于人工手动分组来说,能够得到更多的分组方式,从而能够确定出最优的目标分组集合。
图2为本发明另一示例性实施例示出的神经网络的层分组方法的流程图。
如图2所示,本实施例提供的神经网络的层分组方法,包括:
步骤201,遍历所述神经网络的层,确定出属于预设类型的目标层,将每个所述目标层确定为一个所述第一分组。
其中,本实施例提供的方法中,可以预先确定出一些类型的层,在神经网络的层中找到属于预先确定类型的层,并符合要求的层确定为目标层。
具体的,预设类型的层可以是数据量较大的层,例如全连接层。对于数据量较大的层来说,可能处理器内存单元恰好能够存储该层的数据,也可能存储了该层的数据后,剩余容量无法存储其他层的全部数据,因此,直接将数据量大的层分为一组即可,无需将这样的层与其它层分为一组。假设将数据量大的层与其它层分为一组,极易导致分组无效,还需要再对其进行二次分割,甚至 有可能多次循环分割,最终的结果还是将数据量大的组单独分为一组。因此,直接将数据量大的层分为一组能够提高分组速度,提高分组效率。
进一步的,可以按照神经网络中的层的顺序,依次遍历各个层,筛选出符合预设类型的目标层,然后将每个目标层单独分为一组,例如,神经网络共包括a-g这7个层,遍历过程中,确定出了b、e这两个目标层,则可以将层b作为一个第一分组,将层e作为一个目标分组。
步骤202,根据包括目标层的第一分组、神经网络的层的顺序,对目标层以外的层进行分组,得到其他第一分组。
实际应用时,可以将目标层作为边界,将界之间的其他层分为一组。
具体可以将神经网络中第一个目标层之前的所有层分为一个第一分组;将每两个相邻的目标层之间的所有层分为一个第一分组;将最后一个目标层之后的所有层分为一个第一分组。例如,将目标层b之前的所有层分为一组,即将层a分为一组;将目标层b与目标层e之间的层分为一组,即将层c、d分为一组,将目标层e之后的所有层分为一组,即将层f、g分为一组。
进一步的,层之间的前后关系,可以根据数据生产、消费的关系来确定,生产数据的层在前,消费数据的层在后。例如,层a对输入的数据进行处理生产出新的数据,层b根据层a生产的新数据进行下一步处理,则层a在前,层b在后。
其中,基于步骤201、202,可以对神经网络中的所有层进行初步分组,得到多个第一分组,其中一部分第一分组中仅包括一个预设类型的层。
步骤203,在第一分组中,根据预设有效性规则确定出无效分组。
图2A为本发明一示例性实施例示出的确定无效分组的流程图。
本实施例提供的方法中,根据预设有效性规则确定出无效分组具体可以包括:
步骤2031,确定分组内层间数据流转方向是否为闭环。
其中,在第一分组中确定无效分组时,可以确定第一分组内层间数据流转方向是否为闭环。
具体的,神经网络中的层会接收外部数据或其它层生产的数据,因此,根据数据的生产、消费过程,能够确定出层间的数据流转方向,例如layer1生产数据,layer2消费layer1生产的数据,则数据流转发向由layer1指向layer2。
进一步的,第一分组中可能包括多个层,此时,可以确定出分组中的层间数据流转方向是否为闭环。若存在闭环,则会造成一直循环运行这一分组,造成无法正常运行神经网络的问题。
实际应用时,层间数据流转方向为闭环是指层间数据流转方向一直在组内,例如,组内包括layer1-4这四个层,数据流转方向是从layer1到layer2,从layer2到layer3,从layer3到layer4,从layer4到layer1。
步骤2032,确定分组运行时所占用的内存是否大于运行神经网络的处理器内存。
其中,在对层进行初步分组时,将不属于目标层的相邻层分为一组,此时,可能存在着由于组内层数太多,造成处理器无法对这些层进行处理的问题。例如,运行分组中的层时,需要先将组内层所需的数据从外部存储搬运至内部存储,若组内层数较多导致这些层的数据所占内存超出处理器内存,就会导致处理器无法正常运行这一分组。因此,可以根据分组运行时占用的处理器内存是否大于处理器内存,作为判断分组是否有效的条件。
具体的,此处的处理器是指运行神经网络的设备中的处理器。
其中,步骤2031、2032的执行时序不做限制。
步骤2033,若分组内层间数据流转方向是闭环,或分组运行时所占用的内存大于或等于处理器内存,则确定分组为无效分组。
若分组内出现了数据流转闭环,或分组在运行时占用过大处理器内存,都无法正常运行神经网络,因此,若分组满足这两个条件中的任一个,都可以确定是无效分组。
进一步的,神经网络在对数据进行处理时,若数据量较大,则可以对数据进行分割,再基于分割后的数据进行处理。可能对于一个分组来说,将其所需的全部数据都搬运到处理器内存中,会造成处理器无法正常运行分组的问题,但若将数据进行分割,则处理器可以基于分割数据正常运行该分组中包括的层, 那么可以认为该分组的运行内层小于处理器内存。
图2B为本发明一示例性实施例示出的分组运行时所占用的内存是否大于运行神经网络的处理器内存的流程图。
因此,本实施例提供的方法中,若直接根据分组数据确定分组的运行内存大于处理器内存,则可以对各个分组中涉及的数据进行分割,并根据分割后的数据,确定分组运行时所占用的内存是否大于运行神经网络的处理器内存。具体可以包括:
a、根据预设分割规则对分组数据进行分割得到分割数据。
实际应用时,可以预先设置分割数据用的分割规则,该规则中可以包括多种分割方式,这些方式可以将分组数据分割为不同大小的数据,例如,第一种分割方式可以是在第一个维度对分组数据进行分割,此时,能够得到两部分分割数据,第二种分割方式是在得到的分割数据基础上,再在第二个维度进行分割,此时,可以得到四部分分割数据。比如一个分组数据是4维数据,四个维度分别是batch size、channel、height、width,则第一种分割方式可以在batch size上将分组数据分为两部分分割数据,得到的两部分数据维度是1/2batch size、channel、height、width,对于得到的分割数据,还可以再在batch size上对其进行分割,当然,也可以在其他维度上对分割数据进行切割,例如,在height上进行切割。
在实现过程中,可以先对分组数据进行第一次分割,例如在batch size上将分组数据分为两部分分割数据,并根据得到的分割数据执行步骤b。
b、判断分割数据能否支持分组中的层进行计算。
若否,即不能支持,则执行步骤c。
c、确定分组为无效分组。
其中,对数据进行分割是为了提高设备运行分组内包括的层时速度更快,分割后的数据必须能够支持分组中的层正常运行。因此,可以判断分割数据能否支持分组中的层进行计算,若否,则可以确定分组为无效分组。若分割数据无法支持层间运算,而不对分组数据分割,处理器又无法正常运行该分组,则可以认为这个分组是无效的。
若步骤b中确定分割数据能支持分组中的层进行计算,则执行步骤d。
d、根据每个分组的分割数据确定出各个分组的各层输入数据格式、输出数据格式。
e、根据各层输入数据格式、输出数据格式再次确定分组是否为无效分组。
若e中确定是,可以执行步骤c,确定该分组为无效分组。
其中,可以根据分割数据对分组中的各个层进行反向推导,确定其中各个层的输入数据格式、输出数据格式。
具体的,首先根据每个分组的分割数据确定出各个分组的各层输入数据格式、输出数据格式,该格式可以是数据的维度尺寸,例如batch size slice和height slice。然后根据分组的输出数据格式,确定该分组中各个层的输出数据、输入数据格式,在推导过程中,若遇到相同数据的slice计算结果不一致,即为反向推断失败,则认为该分组无效。
进一步的,在根据各层输入数据格式、输出数据格式时,可以根据神经网络的层顺序,确定出分组的各个层的顺序;根据各个层的顺序确定出产生数据层和消费数据层的组合对,然后根据消费数据层的输出数据格式,确定消费数据层的输入数据格式,根据消费数据层的输入数据格式确定产生数据层的输出数据格式。例如,分组A中包括a、b、c1、c2、d这四层,数据流转方向是a、b、c1、d,以及a、b、c2、d,则a与b、b与c1、b与c2、c1与d、c2与d均是组合对。根据d的输出数据格式可以分别推导出c1、c2的输出数据格式,再根据c1能够推导出b的第一输出数据格式,根据c2能够推导出b的第二输出数据格式。
不同类型的层的反向推导方式也不尽相同,例如,对于卷积、池化这样的层,输入数据的height slice可以根据kernel height和stride计算,对于LRN、Batchnorm这样的层,则输入数据的height slice等于输出数据的height slice。
在根据各层输入数据格式、输出数据格式再次确定分组是否为无效分组时,可以判断一个数据是否包括多个推导格式,若一个数据对应的格式不同,则确定分组为无效分组。例如,推导出层b的两个输出数据格式,若第一输出数据格式与第二输出数据格式不同,则可以认为当前的分组失败。需要说明的是, 推导失败的情况只会发生在神经网络中存在分支的情况。
若根据输入数据格式、输出数据格式确定分组不是无效分组,则执行步骤f。
处理器在实际运行神经网络时,可能会同时运行不同的层,甚至同时运行不同分组,因此,还要考虑实际运行时,处理器是否能够有空余存储位置分配给该分组。
f、根据预设时间安排确定分组中层间产生的待存储数据在处理器内存占用空间的时间段。
可以预先设置设备运行神经网络以及其他程序的时间,可以根据预设时间安排确定出神经网络中的各个层产生待存储数据的时间,以及该数据被消费的时间。然后将这之间的时间段确定为待存储数据在处理器内存占用空间的时间段。
具体可以根据预设时间安排确定分组中各个层产生待存储数据、消费待存储数据的时间步;根据时间步确定出待存储数据在处理器内存占用空间的时间段。例如,在t 0时刻层d生成待存储数据n,在t 1时刻层e会消费该待存储数据n,则t 0-t 1这一段时间,就是待存储数据n占用处理器的时间段。处理器在运行一个分组时,可以将其中层产生的数据存储在本地存储中,再其被消费后,删除该数据,从而无需将每个层产生的数据都存在外部存储,在需要使用该数据时,再从外部存储搬运该数据。本实施例提供的方法,可以直接在内部存储中暂存层间产生的数据。
g、确定出已被分配至处理器内存中的已有数据,并在已有数据中确定出与待存储数据的时间段冲突的冲突数据。
其中,对于处理器来说,其可能会处理多个程序,或并行处理神经网络的多个层,此时,在待存储数据占用的时间段内,处理器中可能已经分配了其他的存储数据,例如,在t 0时刻至t 1时刻这一段时间内,处理器需要存储其他数据m、p、q等。
具体的,可以确定已被分配至处理器内存中的已有数据,并在其中确定出存储时间与待存储数据的时间段有重叠的数据,作为冲突数据。
h、根据冲突数据确定出处理器内存中的可分配空间,并根据可分配空间判 断能否为待存储数据分配存储空间。
对于存储在处理器中的数据来说,其可以具有时间、空间两个维度,不同的存储数据在这两个维度同时有重叠时,就会产生冲突,即不可能同时将不同数据存储在存储单元的同一个bank(分块)中。因此,可以获取冲突数据在处理器内存中的占用位置,根据占用位置在处理器内存中确定出可分配空间,具体可以根据占用位置确定出处理器内存中未被占用的位置,再将处理器内存中,未被占用的位置作为可分配空间。
确定出可分配空间后,可以根据处理器的运行规则确定出分组运行过程中,搬运分组的输入数据所占的第一分块。然后在可分配空间中排除第一分块得到剩余空间,再根据剩余空间确定能否为待存储数据分配存储空间。例如,可以确定剩余空间中的是否有连续的分块能够存储待存储数据,若没有,可以认为不能为待存储数据分配空间。
步骤h中,若判断能够为待存储数据分配存储空间,则执行步骤i。
i、确定分组为有效分组。
其中,一个分组中可以包括多层,每层可能生成多个待存储数据,若分组中的各个层、各个待存储数据都符合上述条件,则可以确定该分组是有效分组。
若根据可分配空间判断不能为待存储数据分配存储空间,则继续执行步骤a中根据预设规则对数据进行分割的步骤。
具体的,有可能出现再次分割数据前,无法为分组中产生的待存储数据分配空间,再次分割数据后,分割数据又无法支持分组中的层正常运行的情况,这种情况会进入步骤c,认为分组无效;若分割数据能够支持分组中的层正常运行,且运行过程中产生的数据能够被成功的分配到处理器的内存中,则这种情况会进入步骤i,认为分组有效。因此,基于本实施例得到的分组,处理器能够正常运行分组内的各个层。
对于第一分组中的有效分组,可以保留该分组,不对其进行二次分割,对于其中的无效分组,则可以基于第二分组规则对其进行二次分割。
步骤204,根据第二分组规则对无效分组进行二次分割,得到第二分组。
步骤204与步骤103的具体原理和实现方式类似,此处不再赘述。
步骤205,在第二分组中,根据预设有效性规则确定出无效分组。
步骤205与步骤104的具体原理和实现方式类似,此处不再赘述。
其中采用的预设有效性规则与步骤203中采用的规则可以相同,例如,图2A所示的确定无效分组的方式。
步骤206,根据第一分组、第二分组确定出多种有效分组集合。
步骤206与步骤105的具体原理和实现方式类似。
本实施例提供的方法中,有效分组集合中包括多个有效的分组,且有效分组集合中包括神经网络的所有层,且层不重复。
步骤207,根据预设规则对每个有效分组集合进行评分。
图2C为本发明一示例性实施例示出的根据预设规则对每个有效分组集合进行评分的流程图。
本实施例提供的方法中,可以包括对数据进行分割的步骤,此时,根据预设规则对每个有效分组集合进行评分可以包括:
步骤2071,确定有效分组集合对应的对分组数据进行计算所花费的计算时间。
其中,计算时间可以包括处理器对分组数据进行切割所耗费的多余时间。可以确定一个有效分组集合中,每个分组对应的对分组数据进行切割所耗费的多余计算时间,再将有效分组集合中,每个分组对应的多余计算时间进行相加,得到该有效分组集合对应的计算时间。
具体的,计算时间中也可以包括处理器运行各个层时,对数据进行处理的时间,但是对于同一设备来说,基于相同数据运行相同的层所耗费的计算时间应该是相同的,因此,这一计算过程耗费的时间也可以考虑。
步骤2072,确定有效分组集合中,每个分组对应的搬运分组的输入数据所花费的搬运时间。
进一步的,在运行分组中的层之前,需要将该分组所需的数据从外部存储 搬运至内部存储,从而在运行这个分组的过程中,无需到外部存储搬运数据,能够提高设备性能。尽管本实施例通过对层进行分组,尽量减少数据搬运次数,但还是不可避免的需要搬运数据,因此,还可以确定有效分组集合中,每个分组对应的搬运数据时间,再将每个分组对应的搬运时间进行叠加,得到该有效分组集合对应的搬运时间。
实际应用时,搬运数据是指搬运运行分组中各个层所必须的数据。这些数据有可能是预先存储在外部存储的,也有可能是分组中的层在运行过程中产生并存储至外部存储的,其它层运行时,可能依赖该数据,因此,需要将该数据搬运至内部存储。
步骤2071、2072的执行时序不进行限制。
步骤2073,根据计算时间、搬运时间确定有效分组集合的评分。
可以简单的将有效分组集合中各个分组的计算时间、搬运时间进行叠加,得到每个有效分组集合在运行过程中花费的额外时间,并将额外时间确定为有效分组集合的评分。
还可以设置搬运时间、计算时间的权重值,再根据权重值计算有效分组集合在运行时花费的额外时间。
步骤208,根据评分在有效分组集合中确定出目标分组集合。
若在步骤207中,根据有效分组集合中各个分组的计算时间、搬运时间确定评分,并且将计算时间、搬运时间之和确定为评分。
则步骤208可以进一步包括:将评分最低的有效分组集合确定为目标分组集合。
上述的计算时间、搬运时间之和可以是简单的进行相加,也可以是设置权重值等方式计算加权后的和,本实施例不对此进行限制。
运行神经网络时,设备的处理速度可以作为设备性能的一个指标,因此,将耗费额外时间最短的有效分组集合确定为目标分组集合。如上述确定评分的方式,耗费额外时间最短的有效分组集合即评分最低的有效分组集合。
图3为本发明一示例性实施例示出的神经网络的层分组装置的结构图。
如图3所示,本实施例提供的神经网络的层分组装置,包括:
第一分组模块31,用于根据第一分组规则对神经网络的层进行分组,得到多个第一分组;
筛选模块32,用于在所述第一分组中,根据预设有效性规则确定出无效分组;
第二分组模块33,用于根据第二分组规则对所述无效分组进行二次分割,得到第二分组;
所述筛选模块32还用于在所述第二分组中,根据预设有效性规则确定出无效分组,并且所述第二分组模块33继续执行根据第二分组规则对所述无效分组进行二次分割的步骤;
集合确定模块34,用于根据所述第一分组、所述第二分组确定出多种有效分组集合;
目标集合确定模块35,用于根据预设规则对每个所述有效分组集合进行评分,并根据所述评分在所述有效分组集合中确定出目标分组集合。
本实施例提供的神经网络的层分组装置,包括第一分组模块,用于根据第一分组规则对神经网络的层进行分组,得到多个第一分组;筛选模块,用于在第一分组中,根据预设有效性规则确定出无效分组;第二分组模块,用于根据第二分组规则对无效分组进行二次分割,得到第二分组;筛选模块还用于在第二分组中,根据预设有效性规则确定出无效分组,并且第二分组模块继续执行根据第二分组规则对无效分组进行二次分割的步骤;集合确定模块,用于根据第一分组、第二分组确定出多种有效分组集合;目标集合确定模块,用于根据预设规则对每个有效分组集合进行评分,并根据评分在有效分组集合中确定出目标分组集合。本实施例提供的装置,能够基于预设分组规则对神经网络中的层进行分组,并根据有效性规则对得到的分组进行校验,并对检验得到的无效分组进行二次分割,从而得到多个有效分组集合;再在有效分组集合中确定出能够最大限度提高设备性能的目标分组集合,因此,本实施例提供的装置能够提高分组效率,且相较于人工手动分组来说,能够得到更多的分组方式,从而能够确定出最优的目标分组集合。
本实施例提供的神经网络的层分组装置的具体原理和实现方式均与图1所示的实施例类似,此处不再赘述。
图4为本发明另一示例性实施例示出的一种神经网络的层分组装置的结构图。
如图4所示,在上述实施例的基础上,本实施例提供的一种神经网络的层分组装置,所述第一分组模块31具体用于:
遍历所述神经网络的层,确定出属于预设类型的目标层,将每个所述目标层确定为一个所述第一分组;
根据包括所述目标层的第一分组、所述神经网络的层的顺序,对所述目标层以外的层进行分组,得到其他所述第一分组。
所述第一分组模块31具体用于:
将所述神经网络中第一个所述目标层之前的所有层分为一个所述第一分组;
将每两个相邻的所述目标层之间的所有层分为一个所述第一分组;
将最后一个所述目标层之后的所有层分为一个所述第一分组。
所述筛选模块32具体用于:
确定分组内层间数据流转方向是否为闭环;
确定所述分组运行时所占用的内存是否大于运行所述神经网络的处理器内存;
若所述分组内层间数据流转方向是闭环,或所述分组运行时所占用的内存大于或等于所述处理器内存,则确定所述分组为所述无效分组。
所述筛选模块32具体用于:
根据预设分割规则对分组数据进行分割得到分割数据;
判断所述分割数据能否支持所述分组中的层进行计算,若否,则确定所述分组为无效分组。
若所述分割数据能支持所述分组中的层进行计算,则所述筛选模块32还用于:
根据每个所述分组的所述分割数据确定出各个分组的各层输入数据格式、输出数据格式;
根据各层所述输入数据格式、所述输出数据格式再次确定所述分组是否为无效分组。
所述筛选模块32具体用于:
根据所述神经网络的层顺序,确定出分组的各个层的顺序;
根据所述各个层的顺序确定出产生数据层和消费数据层的组合对;
根据所述消费数据层的输出数据格式,确定所述消费数据层的输入数据格式,根据所述消费数据层的输入数据格式确定所述产生数据层的输出数据格式。
所述筛选模块32具体用于:
若相同数据对应的格式不同,则确定所述分组为无效分组。
若根据所述输入数据格式、所述输出数据格式确定所述分组不是无效分组,则所述筛选模块32还用于:
根据预设时间安排确定所述分组中层间产生的待存储数据在所述处理器内存占用空间的时间段;
确定出已被分配至所述处理器内存中的已有数据,并在所述已有数据中确定出与所述待存储数据的时间段冲突的冲突数据;
根据所述冲突数据确定出所述处理器内存中的可分配空间,并根据所述可分配空间判断能否为所述待存储数据分配存储空间;
若不能,则所述筛选模块32继续执行所述根据预设规则对所述分组数据进行分割得到分割数据的步骤。
所述筛选模块32具体用于:
根据所述预设时间安排确定所述分组中各个层产生待存储数据、消费所述待存储数据的时间步;
根据所述时间步确定出所述待存储数据在所述处理器内存占用空间的时间段。
所述筛选模块32用于:
获取所述冲突数据在所述处理器内存中的占用位置,根据所述占用位置在所述处理器内存中确定出可分配空间。
所述筛选模块32具体用于:
根据所述处理器的运行规则确定出所述分组运行过程中,搬运所述分组的输入数据所占的第一分块;
根据所述第一分块、所述可分配空间确定剩余空间,并根据所述剩余空间确定能否为所述待存储数据分配存储空间。
可选的,本实施例提供的装置,还包括有效分组确定模块36,用于若所述筛选模块32判断能为所述待存储数据分配存储空间,则所述有效分组确定模块36确定所述分组为有效分组。
所述有效分组集合中包括多个有效的分组,且所述有效分组集合中包括所述神经网络的所有层,且所述层不重复。
所述目标集合确定模块35具体用于:
确定所述有效分组集合对应的对所述分组数据进行计算所花费的计算时间;
确定所述有效分组集合中,每个分组对应的搬运所述分组的输入数据所花费的搬运时间;
根据所述计算时间、所述搬运时间确定所述有效分组集合的所述评分。
所述目标集合确定模块35具体:
将所述计算时间、所述搬运时间之和确定为所述评分;
所述根据所述评分在所述有效分组集合中确定出目标分组集合,包括:
将所述评分最低的所述有效分组集合确定为所述目标分组集合。
本公开实施例还提供了一种计算机,包含上述的神经网络的层分组装置装置。
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述神经网络的层分组方法。
本公开实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述神经网络的层分组方法。
上述的计算机可读存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。
图5为本发明一示例性实施例示出的电子设备的结构图。
如图5所示,本实施例提供的电子设备包括:
至少一个处理器(processor)50,图5中以一个处理器50为例;和存储器
(memory)51,还可以包括通信接口(Communication Interface)52和总线53。其中,处理器50、通信接口52、存储器51可以通过总线53完成相互间的通信。通信接口52可以用于信息传输。处理器50可以调用存储器51中的逻辑指令,以执行上述实施例的神经网络的层分组方法。
此外,上述的存储器51中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器51作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器50通过运行存储在存储器51中的软件程序、指令以及模块,从而执行功能应用以及数据处理,即实现上述方法实施例中的神经网络的层分组方法。
存储器51可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器51可以包括高速随机存取存储器,还可以包括非易失性存储器。
本公开实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂 态存储介质。
当用于本申请中时,虽然术语“第一”、“第二”等可能会在本申请中使用以描述各元件,但这些元件不应受到这些术语的限制。这些术语仅用于将一个元件与另一个元件区别开。比如,在不改变描述的含义的情况下,第一元件可以叫做第二元件,并且同样第,第二元件可以叫做第一元件,只要所有出现的“第一元件”一致重命名并且所有出现的“第二元件”一致重命名即可。第一元件和第二元件都是元件,但可以不是相同的元件。
本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的,除非上下文清楚地表明,否则单数形式的“一个”(a)、“一个”(an)和“所述”(the)旨在同样包括复数形式。类似地,如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外,当用于本申请中时,术语“包括”(comprise)及其变型“包括”(comprises)和/或包括(comprising)等指陈述的特征、整体、步骤、操作、元素,和/或组件的存在,但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。
所描述的实施例中的各方面、实施方式、实现或特征能够单独使用或以任意组合的方式使用。所描述的实施例中的各方面可由软件、硬件或软硬件的结合实现。所描述的实施例也可以由存储有计算机可读代码的计算机可读介质体现,该计算机可读代码包括可由至少一个计算装置执行的指令。所述计算机可读介质可与任何能够存储数据的数据存储装置相关联,该数据可由计算机系统读取。用于举例的计算机可读介质可以包括只读存储器、随机存取存储器、CD-ROM、HDD、DVD、磁带以及光数据存储装置等。所述计算机可读介质还可以分布于通过网络联接的计算机系统中,这样计算机可读代码就可以分布式存储并执行。
上述技术描述可参照附图,这些附图形成了本申请的一部分,并且通过描述在附图中示出了依照所描述的实施例的实施方式。虽然这些实施例描述的足够详细以使本领域技术人员能够实现这些实施例,但这些实施例是非限制性的;这样就可以使用其它的实施例,并且在不脱离所描述的实施例的范围的情况下还可以做出变化。比如,流程图中所描述的操作顺序是非限制性的,因此在流 程图中阐释并且根据流程图描述的两个或两个以上操作的顺序可以根据若干实施例进行改变。作为另一个例子,在若干实施例中,在流程图中阐释并且根据流程图描述的一个或一个以上操作是可选的,或是可删除的。另外,某些步骤或功能可以添加到所公开的实施例中,或两个以上的步骤顺序被置换。所有这些变化被认为包含在所公开的实施例以及权利要求中。
另外,上述技术描述中使用术语以提供所描述的实施例的透彻理解。然而,并不需要过于详细的细节以实现所描述的实施例。因此,实施例的上述描述是为了阐释和描述而呈现的。上述描述中所呈现的实施例以及根据这些实施例所公开的例子是单独提供的,以添加上下文并有助于理解所描述的实施例。上述说明书不用于做到无遗漏或将所描述的实施例限制到本公开的精确形式。根据上述教导,若干修改、选择适用以及变化是可行的。在某些情况下,没有详细描述为人所熟知的处理步骤以避免不必要地影响所描述的实施例。

Claims (36)

  1. 一种神经网络的层分组方法,其特征在于,包括:
    根据第一分组规则对神经网络的层进行分组,得到多个第一分组;
    在所述第一分组中,根据预设有效性规则确定出无效分组;
    根据第二分组规则对所述无效分组进行二次分割,得到第二分组;
    在所述第二分组中,根据预设有效性规则确定出无效分组,并继续执行根据第二分组规则对所述无效分组进行二次分割的步骤;
    根据所述第一分组、所述第二分组确定出多种有效分组集合;
    根据预设规则对每个所述有效分组集合进行评分,并根据所述评分在所述有效分组集合中确定出目标分组集合。
  2. 根据权利要求1所述的方法,其特征在于,所述根据第一分组规则对神经网络的层进行分组,得到多个第一分组,包括:
    遍历所述神经网络的层,确定出属于预设类型的目标层,将每个所述目标层确定为一个所述第一分组;
    根据包括所述目标层的第一分组、所述神经网络的层的顺序,对所述目标层以外的层进行分组,得到其他所述第一分组。
  3. 根据权利要求2所述的方法,其特征在于,根据包括所述目标层的第一分组、所述神经网络的层的顺序,对所述目标层以外的层进行分组,包括:
    将所述神经网络中第一个所述目标层之前的所有层分为一个所述第一分组;
    将每两个相邻的所述目标层之间的所有层分为一个所述第一分组;
    将最后一个所述目标层之后的所有层分为一个所述第一分组。
  4. 根据权利要求1或2所述的方法,其特征在于,所述根据预设有效性规则确定出无效分组,包括:
    确定分组内层间数据流转方向是否为闭环;
    确定所述分组运行时所占用的内存是否大于运行所述神经网络的处理器内 存;
    若所述分组内层间数据流转方向是闭环,或所述分组运行时所占用的内存大于或等于所述处理器内存,则确定所述分组为所述无效分组。
  5. 根据权利要求4所述的方法,其特征在于,所述确定所述分组运行时所占用的内存是否大于运行所述神经网络的处理器内存,包括:
    根据预设分割规则对分组数据进行分割得到分割数据;
    判断所述分割数据能否支持所述分组中的层进行计算,若否,则确定所述分组为无效分组。
  6. 根据权利要求5所述的方法,其特征在于,若所述分割数据能支持所述分组中的层进行计算,则所述方法还包括:
    根据每个所述分组的所述分割数据确定出各个分组的各层输入数据格式、输出数据格式;
    根据各层所述输入数据格式、所述输出数据格式再次确定所述分组是否为无效分组。
  7. 根据权利要求6所述的方法,其特征在于,所述根据每个所述分组的所述分割数据确定出各个分组的各层输入数据格式、输出数据格式,包括:
    根据所述神经网络的层顺序,确定出分组的各个层的顺序;
    根据所述各个层的顺序确定出产生数据层和消费数据层的组合对;
    根据所述消费数据层的输出数据格式,确定所述消费数据层的输入数据格式,根据所述消费数据层的输入数据格式确定所述产生数据层的输出数据格式。
  8. 根据权利要求6或7所述的方法,其特征在于,所述根据各层所述输入数据格式、所述输出数据格式再次确定所述分组是否为无效分组,包括:
    若相同数据对应的格式不同,则确定所述分组为无效分组。
  9. 根据权利要求6所述的方法,其特征在于,若根据所述输入数据格式、所述输出数据格式确定所述分组不是无效分组,则所述方法还包括:
    根据预设时间安排确定所述分组中层间产生的待存储数据在所述处理器内 存占用空间的时间段;
    确定出已被分配至所述处理器内存中的已有数据,并在所述已有数据中确定出与所述待存储数据的时间段冲突的冲突数据;
    根据所述冲突数据确定出所述处理器内存中的可分配空间,并根据所述可分配空间判断能否为所述待存储数据分配存储空间;
    若不能,则继续执行所述根据预设规则对所述分组数据进行分割得到分割数据的步骤。
  10. 根据权利要求9所述的方法,其特征在于,所述根据预设时间安排确定所述分组中层间产生的待存储数据在所述处理器内存占用空间的时间段,包括:
    根据所述预设时间安排确定所述分组中各个层产生待存储数据、消费所述待存储数据的时间步;
    根据所述时间步确定出所述待存储数据在所述处理器内存占用空间的时间段。
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述冲突数据确定出所述处理器内存中的可分配空间,包括:
    获取所述冲突数据在所述处理器内存中的占用位置,根据所述占用位置在所述处理器内存中确定出可分配空间。
  12. 根据权利要求9所述的方法,其特征在于,所述根据所述可分配空间判断能否为所述待存储数据分配存储空间,包括:
    根据所述处理器的运行规则确定出所述分组运行过程中,搬运所述分组的输入数据所占的第一分块;
    根据所述第一分块、所述可分配空间确定剩余空间,并根据所述剩余空间确定能否为所述待存储数据分配存储空间。
  13. 根据权利要求9所述的方法,其特征在于,若判断能为所述待存储数据分配存储空间,则确定所述分组为有效分组。
  14. 根据权利要求1-3、5-7、9-13任一项所述的方法,其特征在于,所述 有效分组集合中包括多个有效的分组,且所述有效分组集合中包括所述神经网络的所有层,且所述层不重复。
  15. 根据权利要求11所述的方法,其特征在于,所述根据预设规则对每个所述有效分组集合进行评分,包括:
    确定所述有效分组集合对应的对所述分组数据进行计算所花费的计算时间;
    确定所述有效分组集合中,每个分组对应的搬运所述分组的输入数据所花费的搬运时间;
    根据所述计算时间、所述搬运时间确定所述有效分组集合的所述评分。
  16. 根据权利要求15所述的方法,其特征在于,所述根据所述计算时间、所述搬运时间确定所述有效分组集合的所述评分,包括:
    将所述计算时间、所述搬运时间之和确定为所述评分;
    所述根据所述评分在所述有效分组集合中确定出目标分组集合,包括:
    将所述评分最低的所述有效分组集合确定为所述目标分组集合。
  17. 一种神经网络的层分组装置,其特征在于,包括:
    第一分组模块,用于根据第一分组规则对神经网络的层进行分组,得到多个第一分组;
    筛选模块,用于在所述第一分组中,根据预设有效性规则确定出无效分组;
    第二分组模块,用于根据第二分组规则对所述无效分组进行二次分割,得到第二分组;
    所述筛选模块还用于在所述第二分组中,根据预设有效性规则确定出无效分组,并且所述第二分组模块继续执行根据第二分组规则对所述无效分组进行二次分割的步骤;
    集合确定模块,用于根据所述第一分组、所述第二分组确定出多种有效分组集合;
    目标集合确定模块,用于根据预设规则对每个所述有效分组集合进行评分,并根据所述评分在所述有效分组集合中确定出目标分组集合。
  18. 根据权利要求17所述的装置,其特征在于,所述第一分组模块具体用于:
    遍历所述神经网络的层,确定出属于预设类型的目标层,将每个所述目标层确定为一个所述第一分组;
    根据包括所述目标层的第一分组、所述神经网络的层的顺序,对所述目标层以外的层进行分组,得到其他所述第一分组。
  19. 根据权利要求18所述的装置,其特征在于,所述第一分组模块具体用于:
    将所述神经网络中第一个所述目标层之前的所有层分为一个所述第一分组;
    将每两个相邻的所述目标层之间的所有层分为一个所述第一分组;
    将最后一个所述目标层之后的所有层分为一个所述第一分组。
  20. 根据权利要求17或18所述的装置,其特征在于,所述筛选模块具体用于:
    确定分组内层间数据流转方向是否为闭环;
    确定所述分组运行时所占用的内存是否大于运行所述神经网络的处理器内存;
    若所述分组内层间数据流转方向是闭环,或所述分组运行时所占用的内存大于或等于所述处理器内存,则确定所述分组为所述无效分组。
  21. 根据权利要求20所述的装置,其特征在于,所述筛选模块具体用于:
    根据预设分割规则对分组数据进行分割得到分割数据;
    判断所述分割数据能否支持所述分组中的层进行计算,若否,则确定所述分组为无效分组。
  22. 根据权利要求21所述的装置,其特征在于,若所述分割数据能支持所述分组中的层进行计算,则所述筛选模块还用于:
    根据每个所述分组的所述分割数据确定出各个分组的各层输入数据格式、输出数据格式;
    根据各层所述输入数据格式、所述输出数据格式再次确定所述分组是否为无效分组。
  23. 根据权利要求22所述的装置,其特征在于,所述筛选模块具体用于:
    根据所述神经网络的层顺序,确定出分组的各个层的顺序;
    根据所述各个层的顺序确定出产生数据层和消费数据层的组合对;
    根据所述消费数据层的输出数据格式,确定所述消费数据层的输入数据格式,根据所述消费数据层的输入数据格式确定所述产生数据层的输出数据格式。
  24. 根据权利要求22或23所述的装置,其特征在于,所述筛选模块具体用于:
    若相同数据对应的格式不同,则确定所述分组为无效分组。
  25. 根据权利要求22所述的装置,其特征在于,若根据所述输入数据格式、所述输出数据格式确定所述分组不是无效分组,则所述筛选模块还用于:
    根据预设时间安排确定所述分组中层间产生的待存储数据在所述处理器内存占用空间的时间段;
    确定出已被分配至所述处理器内存中的已有数据,并在所述已有数据中确定出与所述待存储数据的时间段冲突的冲突数据;
    根据所述冲突数据确定出所述处理器内存中的可分配空间,并根据所述可分配空间判断能否为所述待存储数据分配存储空间;
    若不能,则所述筛选模块继续执行所述根据预设规则对所述分组数据进行分割得到分割数据的步骤。
  26. 根据权利要求25所述的装置,其特征在于,所述筛选模块具体用于:
    根据所述预设时间安排确定所述分组中各个层产生待存储数据、消费所述待存储数据的时间步;
    根据所述时间步确定出所述待存储数据在所述处理器内存占用空间的时间段。
  27. 根据权利要求25所述的装置,其特征在于,所述筛选模块用于:
    获取所述冲突数据在所述处理器内存中的占用位置,根据所述占用位置在所述处理器内存中确定出可分配空间。
  28. 根据权利要求25所述的装置,其特征在于,所述筛选模块具体用于:
    根据所述处理器的运行规则确定出所述分组运行过程中,搬运所述分组的输入数据所占的第一分块;
    根据所述第一分块、所述可分配空间确定剩余空间,并根据所述剩余空间确定能否为所述待存储数据分配存储空间。
  29. 根据权利要求25所述的装置,其特征在于,还包括有效分组确定模块,用于若所述筛选模块判断能为所述待存储数据分配存储空间,则所述有效分组确定模块确定所述分组为有效分组。
  30. 根据权利要求17-19、21-23、25-29任一项所述的装置,其特征在于,所述有效分组集合中包括多个有效的分组,且所述有效分组集合中包括所述神经网络的所有层,且所述层不重复。
  31. 根据权利要求27所述的装置,其特征在于,所述目标集合确定模块具体用于:
    确定所述有效分组集合对应的对所述分组数据进行计算所花费的计算时间;
    确定所述有效分组集合中,每个分组对应的搬运所述分组的输入数据所花费的搬运时间;
    根据所述计算时间、所述搬运时间确定所述有效分组集合的所述评分。
  32. 根据权利要求31所述的装置,其特征在于,所述目标集合确定模块具体:
    将所述计算时间、所述搬运时间之和确定为所述评分;
    所述根据所述评分在所述有效分组集合中确定出目标分组集合,包括:
    将所述评分最低的所述有效分组集合确定为所述目标分组集合。
  33. 一种计算机,其特征在于,包含权利要求17-32任一项所述的装置。
  34. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行权利要求1-16任一项所述的方法。
  35. 一种计算机可读存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-16任一项所述的方法。
  36. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1-16任一项所述的方法。
PCT/CN2018/114549 2018-11-08 2018-11-08 神经网络层分组方法、装置、设备、存储介质及程序产品 WO2020093306A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880098346.6A CN112955906B (zh) 2018-11-08 2018-11-08 神经网络层分组方法、装置、设备、存储介质及程序产品
PCT/CN2018/114549 WO2020093306A1 (zh) 2018-11-08 2018-11-08 神经网络层分组方法、装置、设备、存储介质及程序产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/114549 WO2020093306A1 (zh) 2018-11-08 2018-11-08 神经网络层分组方法、装置、设备、存储介质及程序产品

Publications (1)

Publication Number Publication Date
WO2020093306A1 true WO2020093306A1 (zh) 2020-05-14

Family

ID=70612430

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/114549 WO2020093306A1 (zh) 2018-11-08 2018-11-08 神经网络层分组方法、装置、设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN112955906B (zh)
WO (1) WO2020093306A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915017A (zh) * 2020-07-29 2020-11-10 北京灵汐科技有限公司 一种校准方法、装置、终端设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003079286A1 (en) * 2002-03-15 2003-09-25 Pacific Edge Biotechnology Limited Medical applications of adaptive learning systems using gene expression data
US20150161522A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition
CN105095833A (zh) * 2014-05-08 2015-11-25 中国科学院声学研究所 用于人脸识别的网络构建方法、识别方法及系统
CN105550744A (zh) * 2015-12-06 2016-05-04 北京工业大学 一种基于迭代的神经网络聚类方法
CN106355244A (zh) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 卷积神经网络的构建方法及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227851B (zh) * 2016-07-29 2019-10-01 汤一平 基于深度卷积神经网络的分层深度搜索的图像检索方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003079286A1 (en) * 2002-03-15 2003-09-25 Pacific Edge Biotechnology Limited Medical applications of adaptive learning systems using gene expression data
US20150161522A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition
CN105095833A (zh) * 2014-05-08 2015-11-25 中国科学院声学研究所 用于人脸识别的网络构建方法、识别方法及系统
CN105550744A (zh) * 2015-12-06 2016-05-04 北京工业大学 一种基于迭代的神经网络聚类方法
CN106355244A (zh) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 卷积神经网络的构建方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915017A (zh) * 2020-07-29 2020-11-10 北京灵汐科技有限公司 一种校准方法、装置、终端设备及存储介质
WO2022022417A1 (zh) * 2020-07-29 2022-02-03 北京灵汐科技有限公司 一种校准方法、装置、终端设备及存储介质
US11816547B2 (en) 2020-07-29 2023-11-14 Lynxi Technologies Co., Ltd. Calibration method and apparatus, terminal device, and storage medium
CN111915017B (zh) * 2020-07-29 2023-11-24 北京灵汐科技有限公司 一种校准方法、装置、终端设备及存储介质

Also Published As

Publication number Publication date
CN112955906A (zh) 2021-06-11
CN112955906B (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
TWI748151B (zh) 神經網絡計算加速器及其執行的方法
CN106709503B (zh) 一种基于密度的大型空间数据聚类算法k-dbscan
CN111352712B (zh) 云计算任务跟踪处理方法、装置、云计算系统及服务器
US11928599B2 (en) Method and device for model compression of neural network
US11630983B2 (en) Graph conversion method
CN115904539A (zh) 一种切分策略的在线生成方法、装置、设备及存储介质
CN105302536A (zh) MapReduce应用的相关参数的配置方法和装置
CN113313247B (zh) 基于数据流架构的稀疏神经网络的运算方法
CN108875914B (zh) 对神经网络数据进行预处理和后处理的方法和装置
CN115829017A (zh) 一种基于芯粒的数据处理的方法、装置、介质及设备
WO2020093306A1 (zh) 神经网络层分组方法、装置、设备、存储介质及程序产品
CN111523642B (zh) 用于卷积运算的数据重用方法、运算方法及装置、芯片
CN107392387B (zh) 一种agv最优管制时间的调度方法
WO2023040372A1 (zh) 一种基于图算法的ai建模流程编排方法和系统
US11914648B2 (en) Graph refactorization method and graph refactorization apparatus
CN113886092A (zh) 一种计算图执行方法、装置及相关设备
WO2020093304A1 (zh) 神经网络编译方法、装置、设备、存储介质及程序产品
CN116501927A (zh) 一种图数据处理系统、方法、设备及存储介质
CN116545958A (zh) 一种应用于pisa架构芯片的基本块排布方法
CN102999542B (zh) 多媒体数据高维索引及kNN检索方法
EP3258388A1 (en) Parallelization techniques for variable selection and predictive models generation and its applications
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks
US10255394B1 (en) Reduced overhead for massive parallel processing
CN112019368B (zh) 一种vnf迁移方法、装置及存储介质
CN117349031B (zh) 一种分布式超算资源调度分析方法、系统、终端及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18939745

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.10.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18939745

Country of ref document: EP

Kind code of ref document: A1