WO2020164644A2 - 神经网络模型拆分方法、装置、计算机设备和存储介质 - Google Patents

神经网络模型拆分方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020164644A2
WO2020164644A2 PCT/CN2020/084416 CN2020084416W WO2020164644A2 WO 2020164644 A2 WO2020164644 A2 WO 2020164644A2 CN 2020084416 W CN2020084416 W CN 2020084416W WO 2020164644 A2 WO2020164644 A2 WO 2020164644A2
Authority
WO
WIPO (PCT)
Prior art keywords
state
split
path
operator
tensor data
Prior art date
Application number
PCT/CN2020/084416
Other languages
English (en)
French (fr)
Other versions
WO2020164644A3 (zh
Inventor
周玉松
张潇
吴林阳
俞烨昊
许云龙
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910115162.2A external-priority patent/CN111563587B/zh
Priority claimed from CN201910115130.2A external-priority patent/CN111563586B/zh
Priority claimed from CN201910114927.0A external-priority patent/CN111563584B/zh
Priority claimed from CN201910114967.5A external-priority patent/CN111563585B/zh
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Priority to EP20756078.0A priority Critical patent/EP3926546A4/en
Priority to US17/419,290 priority patent/US20220092386A1/en
Publication of WO2020164644A2 publication Critical patent/WO2020164644A2/zh
Publication of WO2020164644A3 publication Critical patent/WO2020164644A3/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a method for splitting a neural network model and related products.
  • deep learning accelerators have been continuously proposed and, like general-purpose processors, are expanding from single core to multi-core.
  • This expanded multi-core structure can support data parallelism in the training phase to increase data throughput and speed up training.
  • deep neural networks have higher requirements for end-to-end latency than throughput deep neural networks, which often determines the availability of accelerators in a certain scenario.
  • Traditional data parallel schemes cannot meet the requirements for small data and low latency of accelerators in inference scenarios.
  • the present disclosure provides a neural network model splitting method, wherein the method includes:
  • the operator of the target layer in the neural network model determine the split state set of the tensor data associated with the operator of the target layer; wherein, the target layer is at least one layer in the neural network model ;
  • the state path represents the operator's Splitting method
  • each state in the split state set represents a sub-tensor data set
  • the union result of all sub-tensor data of the state is the tensor data
  • the target splitting path is used to split the operator of the target layer of the neural network model.
  • a neural network model splitting device which includes:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the present disclosure also provides that in order to achieve the above objective, the present disclosure provides a neural network model splitting method, the method includes:
  • the operator of the target layer in the neural network model determine the split state set of the tensor data associated with the operator of the target layer; wherein, the target layer is at least one layer in the neural network model ;
  • a glue operator is inserted between the operator of the target layer and the associated split state set to adjust the state in the split state set of the tensor data of the operator; wherein, the glue operator is used to The state of the tensor data obtained according to a splitting method is adjusted to a state obtained according to any splitting method;
  • the state path represents the operator's Splitting method
  • each state in the split state set represents a sub-tensor data set
  • the union result of all sub-tensor data of the state is the tensor data
  • the target splitting path is used to split the operator of the target layer of the neural network model.
  • a neural network model splitting device which includes:
  • the split state set determination module is used to determine the split state set of the tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the target layer. At least one layer in the neural network model;
  • Insert a glue operator module which is used to insert a glue operator between the operator of the target layer and the associated split state set to adjust the state in the split state set of the tensor data of the operator; wherein, The glue operator is used to adjust the state obtained by the tensor data according to a splitting method to a state obtained by any splitting method;
  • the state path determination module is configured to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state The path represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path determination module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the present disclosure also provides a neural network model splitting method, wherein the method includes:
  • the operator of the target layer in the neural network model determine the split state set of the tensor data associated with the operator of the target layer; wherein, the target layer is at least one layer in the neural network model ;
  • a compensation operator is inserted between the operator of the target layer and the associated split state set to adjust the state in the split state set of the input tensor data of the operator; wherein the compensation operator is used for Acquiring target data from adjacent sub-tensor data of any sub-tensor data of the state, and merging the target data with the sub-tensor data;
  • the state path represents the operator's Splitting method
  • each state in the split state set represents a sub-tensor data set
  • the union result of all sub-tensor data of the state is the tensor data
  • the target splitting path is used to split the operator of the target layer of the neural network model.
  • a neural network model splitting device which includes:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • Inserting a compensation operator module for inserting a compensation operator between the operator of the target layer and the associated set of split states, and adjust the state in the set of split states of the input tensor data of the operator; wherein , The compensation operator is used to obtain target data from adjacent sub-tensor data of any sub-tensor data in the state, and merge the target data with the sub-tensor data;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the present disclosure also provides a neural network model splitting method, the method includes:
  • the operator of the target layer in the neural network model determine the split state set of the tensor data associated with the operator of the target layer; wherein, the target layer is at least one layer in the neural network model ;
  • a glue operator is inserted between the operator of the target layer and the associated split state set to adjust the state in the split state set of the tensor data of the operator; wherein, the glue operator is used to The state of the tensor data obtained according to a splitting method is adjusted to a state obtained according to any splitting method;
  • a compensation operator is inserted between the operator of the target layer and the associated split state set to adjust the state in the split state set of the input tensor data of the operator; wherein the compensation operator is used for Acquiring target data from adjacent sub-tensor data of any sub-tensor data of the state, and merging the target data with the sub-tensor data;
  • the state path represents the operator's Splitting method
  • each state in the split state set represents a sub-tensor data set
  • the union result of all sub-tensor data of the state is the tensor data
  • the target splitting path is used to split the operator of the target layer of the neural network model.
  • a neural network model splitting device including:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • Insert a glue operator module for inserting a glue operator between the operator of the target layer and the associated set of split states to adjust the state in the set of split states of the tensor data of the operator; wherein, The glue operator is used to adjust the state obtained by the tensor data according to a splitting method to a state obtained by any splitting method;
  • Inserting a compensation operator module for inserting a compensation operator between the operator of the target layer and the associated set of split states, and adjust the state in the set of split states of the input tensor data of the operator; wherein , The compensation operator is used to obtain target data from adjacent sub-tensor data of any sub-tensor data in the state, and merge the target data with the sub-tensor data;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the technical solution of the present disclosure can realize the expansion of a deep learning accelerator from a single core to a multi-core structure with a small overhead, and can provide an efficient splitting solution for the characteristics of a given network and underlying accelerator, which can be effective Reduce the end-to-end latency of various networks on multi-core accelerators.
  • Figure 1 is a schematic diagram of a shared storage multi-core structure
  • FIG. 2 is one of the flowcharts of a neural network model splitting method provided by an embodiment of the disclosure
  • Figure 3 is a schematic diagram of serial neural network model splitting
  • Figure 5 is a schematic diagram of the splitting of a glue operator between the operator and the input tensor data
  • Figure 6 is a schematic diagram of compensation
  • FIG. 7 is the third flowchart of a neural network model splitting method provided by an embodiment of the disclosure.
  • Figure 8 is a schematic diagram of pyramid splitting
  • FIG. 9 is the fourth flow chart of a neural network model splitting method provided by an embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of a neural network model splitting hardware device provided by an embodiment of the disclosure.
  • deep learning accelerators have become a rapidly developing field. These newly emerging accelerators often have greater advantages over GPUs in terms of performance and power consumption.
  • Data parallelism refers to speeding up training by dividing the training data set into several parts and using multiple processing cores to process part of the sub-data sets separately.
  • each core processes different data sets in the training data in parallel, thereby increasing the throughput of the entire system and speeding up training. Therefore, the multi-core accelerator architecture can easily expand the computing throughput of the entire system during the training phase while maintaining the good performance-to-power ratio of each core.
  • this shared memory multi-core structure is a classic multi-core structure.
  • This structure is very suitable for data parallel neural network training methods.
  • Each core can be used as a processor in data parallelism, read different data separately and then complete the forward and backward calculations of the neural network model in parallel.
  • Each core can still maintain its good performance-to-power ratio under the previous single-core architecture in the calculation phase, and at the same time, the throughput of the entire system can also increase with the expansion of the number of cores.
  • the problem with data parallelism is that its scalability depends on the size of the processed data batch. Although this is usually not a problem in the training phase, it is difficult to guarantee the premise of the inference phase.
  • the neural network model used in the real-time service field (including video surveillance, automatic driving, etc.), the processed data is usually serially input in a stream, resulting in a small scale or even a single data processed each time. Pictures.
  • data parallelism cannot provide any degree of parallelism, and all work tasks will be concentrated on a single core, which prevents the computing resources brought by multiple cores from being converted into the speed of processing tasks.
  • the model After the training of the neural network model is completed using the data set offline, the model will be deployed to a server in the cloud to process the data sent from the outside world.
  • the application scenario changes from offline training to online reasoning.
  • a very important indicator is the time delay, that is, the time from the server receiving the data to be processed to the return of the processed result, and furthermore, the time to process the data using the neural network model.
  • the low latency ensures that the cloud server can respond to the data sent by the client in the shortest time. In some more sensitive scenarios, it directly determines whether the solution is available. Therefore, the requirements for accelerators in the online reasoning stage have changed from processing large batches of data with high throughput to processing small batches of data with low latency.
  • One method is to split the calculation task of each operator in the neural network into multiple cores for calculation. This method can ensure that there are multiple cores at every moment even when processing a single picture inference task Participate in calculations to achieve the purpose of using multi-core resources to reduce time delay.
  • the neural network model can be regarded as a complex calculation graph composed of hundreds or even thousands of operators.
  • the algorithm logic in different types of operators is different, which leads to different methods of splitting these operators.
  • the division of each operator, in addition to balancing its own calculation efficiency and parallelism, also considers the combination with the front and back operators, and even the overall impact.
  • the rapid development of deep learning has brought about more and more large-scale and complex networks. It is unrealistic to find a good parallel method manually. Therefore, an automated method is needed to ensure that it can be used for different networks. Give a better split and parallel strategy.
  • an end-to-end splitting plan is automatically given for large-scale neural network models.
  • This plan splits an operator into multiple smaller-scale sub-operators, so that the single-core architecture can be directly called
  • the calculation library below avoids the extra workload of re-implementation.
  • an activation operator can get many smaller activation operators after splitting, which means that only the original single-core activation function needs to be called on multiple cores to complete each subtask, without modification or renewal Implement a multi-core version of the activation function.
  • the ultimate goal is to obtain a split and parallel solution that can effectively reduce the end-to-end reasoning delay of the entire neural network model.
  • the vehicle needs to analyze and process external information such as images, videos, and voices from the on-board sensors during the automatic driving process.
  • external information such as images, videos, and voices from the on-board sensors during the automatic driving process.
  • the vehicle In order to ensure safety, the vehicle must be processed in the shortest possible time to make a decision.
  • Vehicles with multi-core processor structure chips can use this solution to distribute the computational load of the neural network model to process small batches of external information evenly to multiple processor cores, and complete the information processing within the specified response time. Return The processing result assists the vehicle to drive automatically.
  • the technical solution can realize the expansion of a deep learning accelerator from a single core to a multi-core structure with a small overhead, and the solution can effectively reduce the end-to-end delay of various networks on the multi-core accelerator.
  • the multi-core processor structure chip is set on the vehicle.
  • the multi-core processor structure chip can be set on the cloud server, and the vehicle can generate the image, video, voice and other external information from the vehicle sensor to the cloud server through 3G/4G, WIFI and other networks.
  • the cloud server uses this solution to evenly distribute the computational load of the neural network model for processing small batches of external information to multiple processing cores. Within the specified response time of the vehicle, the cloud server will feed back the processing result to the vehicle via 3G/4G, WIFI and other networks.
  • the scale of external information collected by on-board sensors is different. Before application, according to external information of different scales, the on-board processor uses this scheme to determine the corresponding operator split path.
  • the multi-core processor structure chip obtains the external information
  • the corresponding operator splitting path is called to split the operators in the neural network model, and the external
  • the calculation load of information is evenly distributed to multiple processor cores.
  • the upper framework needs to call the calculation library to get the instruction implementation of each operator in the neural network model on the processor.
  • the framework informs the calculation library of the type and parameters of each operator, and the calculation library returns the machine instructions required for each operator to execute on the processor.
  • the framework loads the data and the machine instructions onto the processor through the driver, starts the processor and completes the calculation of the operator.
  • the computing library needs to be redesigned to enable it to generate machine instructions running on multiple cores. Specifically, since multiple cores need to read different parts of the same input tensor data, and also need to write their outputs back to different parts of the same output tensor data, the calculation library needs to modify the calculation instructions of each operator All instructions for reading and storing parts.
  • the neural network splitting method provided by the embodiments of the present disclosure can avoid modifying the single-core processor calculation library as much as possible, and can also realize the parallel execution of the neural network model on the multi-core processor.
  • the upper framework divides the operator in the neural network model into several sub-operators that can be executed in parallel. For each sub-operator, the framework calls the calculation library to generate machine instructions for the sub-operator to execute on a single core. , By loading the machine instructions of the sub-operator on different cores, the parallel calculation of the operator on the multi-core processor is realized.
  • the framework uses a single-core processor computing library to generate calculation instructions for sub-operators
  • the input and output tensor data of the operators in the neural network model are also split as the operators are split into sub-operators. Divide into corresponding sub-tensor data.
  • FIG. 2 a flowchart of a neural network model splitting method is provided for an embodiment of the present disclosure. include:
  • Step 201) Determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model.
  • the neural network model can usually be regarded as a directed acyclic graph composed of operators and multi-dimensional tensor data. Operators and tensor data are connected to each other through directed edges. The direction of the edge indicates that the data is the input or output of the operator. Use op to represent operators and tensor to represent tensor data.
  • the framework uniformly chooses to use the splitting methods of tensor data associated with the operators to illustrate the splitting methods of different operators. Assume that all tensor data in the network are 4-dimensional.
  • the actual dimension is less than 4, so it is still expressed as 4-dimensional tensor.
  • N represents the size of the batch
  • C represents the number of feature images
  • H represents the height of the feature image
  • W represents the width of the feature image.
  • this technical solution adopts the split state of the input tensor data and output tensor data of the operator to represent the splitting of the calculation logic of the operator itself.
  • any kind of splitting of tensor data is called a state s of the tensor data.
  • a sub-tensor data set is obtained.
  • the state s is represented by the corresponding sub-tensor data set.
  • All possible splits ⁇ s 0 ,s 1 ,s 2 ,... ⁇ constitute the split state set S of the tensor data.
  • this is a very large state space, which means that the tensor data
  • the space of possible split modes of the operator represented by the split state of is also very huge.
  • the state set of tensor data can be pruned.
  • the time delay for a multi-core accelerator to complete the calculation of an operator depends on the time of the core that takes the longest time to execute the subtask, and the cores in the multi-core architecture are equal to each other in terms of hardware structure, so the cost of each core The length of time consumed depends on the task load assigned to the core. Therefore, a reasonable assumption is that the scale of the sub-operators after splitting should be roughly balanced. For this reason, those unbalanced split states can be omitted from the state set S of the tensor data.
  • the number of cores in a multi-core architecture is usually an integer power of 2, such as 1, 2, 4, 8, 16, and so on.
  • any tensor data associated with an operator can represent an effective splitting method of the operator.
  • the dimension of the split of tensor data should be supported by the operator.
  • the input data of the normalized exponential regression operator (Softmax) should not be split in the dimension to be normalized.
  • Softmax normalized exponential regression operator
  • the splitting of the input tensor and output tensor of the operator should satisfy the calculation logic of the operator.
  • the start and end points of each sub-block split in the H/W dimension of the output data of the convolution operator should be It is indeed that the sub-blocks of the corresponding input data split in the H/W dimension are calculated according to the convolution kernel and the displacement step of the convolution operator; the input data of the convolution operator should be split in the C dimension It is exactly the same as the split of the weight data in the C dimension, and the split of the output data in the C dimension should be exactly the same as the split of the weight data in the N dimension.
  • the output state is used to infer the input state of the operator according to the specific logic of each operator, or the input state is used to infer the output state of the operator forward according to the specific logic of each operator. This ensures that the state of related data can always represent an effective operator splitting method.
  • Step 202) Traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path.
  • the splitting scheme P of the entire neural network model can be regarded as a split state in the set of split states of the input tensor data of each operator to a split state in the output tensor Jump.
  • the split state of the output tensor of the previous operator is the split state of the input tensor of the next operator.
  • Each possible jump through the operator corresponds to an effective way of splitting on the operator. Therefore, the state path represents how the operator is split.
  • the tensor data is decomposed according to a decomposition method to obtain a sub-tensor set.
  • the sub-tensor set corresponds to a split state.
  • Different decomposition methods can obtain multiple split states.
  • the split state obtained in this way constitutes a set of split states. It can be seen that each split state corresponds to a sub-tensor set, and the sub-tensor set contains all the elements in the tensor data. Moreover, in a set of sub-tensors, the elements of each sub-tensor may or may not overlap.
  • the state path represents the split mode of the operator, and the calculation logic of the operator is split through the split mode corresponding to the state path to obtain the corresponding set of sub-operators.
  • the state of the input tensor data and the state of the corresponding output tensor data are connected by a state path, and the sub-tensor data set representing a split state of the input tensor data is processed by the sub-operator in the sub-operator set to obtain the Output tensor data corresponding to the sub-tensor data collection of the split state.
  • the weight of the state path is characterized by the time it takes for the operator to execute in parallel on the multi-core accelerator in a certain split state, and the time for the multi-core accelerator to complete the calculation of an operator depends on the time it takes to execute the subtask The longest nuclear time.
  • the inner maximum operation is based on the fact that the calculation part and the memory access part realized by the operator can hide each other, that is, the calculation and the memory access can be executed concurrently as much as possible.
  • the calculation throughput of each core will decrease, and ⁇ can be further modified to make the estimation more accurate.
  • the outer maximum operation means that the time for the multi-core accelerator to complete an operator's calculation depends on the time of the core that takes the longest time to execute the subtask.
  • the weight of the state path can be not only the time it takes to execute the subtask, but also the throughput of the execution of the subtask.
  • the weight of the state path can also be determined by actually measuring the time for executing all the subtasks in the operator split mode corresponding to the state path on the multi-core processor.
  • Step 203) Determine the target split path of the target layer according to the weight of the state path.
  • step 203 the weight of the state path is used to determine the split path of the target layer in two ways.
  • the first method is forward traversal to determine the split path.
  • the steps include:
  • the second method is reverse traversal to determine the split path.
  • the steps include:
  • the following example illustrates how to traverse all the split state sets of the target layer to obtain the split state set of the input tensor data of the target layer and the split state set of the output tensor data of the target layer.
  • the target split path illustrates how to traverse all the split state sets of the target layer to obtain the split state set of the input tensor data of the target layer and the split state set of the output tensor data of the target layer.
  • the neural network model shown in Figure 3 is serial, and the input tensor data and output tensor data of the entire neural network model are not split.
  • Split the operators of the entire neural network model in Figure 3 and describe a serial neural network model containing n operators as a sequence of operators (op 1 , op 2 ,..., op n ), assuming that every An operator has only one input and one output.
  • the input of the previous operator is the output of the next operator, then the input tensor data and output tensor data of the entire neural network model and the intermediate results between all operators are included.
  • all tensor data form a set (tensor 0 , tensor 1 ,..., tensor n ), the input of op i is tensor i-1 and the output is tensor i .
  • op i tensor i-1
  • S i state set
  • the goal of the search strategy is to find a mapping relationship tensor i ⁇ s i between the tensor itself and a certain state of its state set through Determine a specific split state for each tensor data in the neural network model, so that the split method of all operators can be determined.
  • a mapping of all tensor data in a neural network model to its split state The relationship is called a split scheme P of the network model.
  • the i-th operator op i uses the input data in the split state s to calculate the output tensor data in the split state r.
  • the specific parallel calculation method is based on the state of the input tensor data and the output tensor data
  • the calculation time of this operator is recorded as t s ⁇ r , and its value depends on the corresponding splitting method and the hardware characteristics of the underlying accelerator.
  • the calculation formula for the delay T of the entire network is:
  • t i can be regarded as the state of the input tensor data of the operator pointing to the output tensor.
  • the weight of the directed edge of the state of the quantity data is only one kind of non-split to keep the entire data block continuous and complete, which makes the neural network model split Scheme P starts with complete input data and ends with complete output data. External users always see a complete input and output.
  • searching for a good splitting plan P for a given neural network model is to find the shortest path from the unsplit state of the input tensor data to the unsplit state of the output tensor data.
  • the path is A state pass must be selected in the effective state space of each intermediate result tensor. Equations 3 and 4 give this abstract formula.
  • the non-split state of the input tensor data of the entire neural network model is the initial state s root
  • the non-split state of the input tensor data of the neural network model is the initial state s
  • the weight of the split path corresponding to root is 0, and the weight of the split path corresponding to all states of all other tensor data is ⁇ .
  • Any state s of any piece of data in the neural network model has a corresponding split path weight from s root to s as l s . Visit each split state set from front to back, in each split state set, traverse each state s in turn.
  • the non-split state s root of the input tensor data of the entire neural network model is obtained to the output tensor data of the neural network model.
  • the target split path in state s end .
  • the above description describes a path formed from the non-split state s root to the non-split state s end through one state in each split state set, which is the split path of the neural network model. Select the smallest weight from the split paths of the neural network model as the target split path of the neural network model.
  • the neural network model shown in Figure 3 is a serial neural network model, and in order to facilitate the description of this technical solution, the input tensor data and output tensor data of the neural network model correspond to the set of split states.
  • Split state When the split state set of the output tensor data of the neural network model is not the unsplit state s end , but a set of multiple split states, each of the split state sets of the output tensor data of the neural network model The minimum value of the weight of the split path of the split state is selected as the target split path between the split state set of the input tensor data of the entire neural network model and the split state set of the output tensor data of the neural network model .
  • the entire scheme can also be converted to search for a split path from the non-split state s end to the non-split state s root , and the two are equivalent.
  • the split state set of the input tensor data of the neural network model is not the non-split state s end , but when multiple split states form a set, the split state set of the input tensor data from the neural network model The minimum value is selected from the weight of the split path of each split state as the split state set of the input tensor data of the entire neural network model to the split state set of the output tensor data of the neural network model The target split path.
  • the neural network model shown in Fig. 3 is a serial neural network model, and the state in the split state set of the input tensor data and the output tensor data of the model is not split.
  • the operator located at the branch junction has more than one input tensor data, such as the parametric addition operator (Add), the parametric multiplication operator (Mult), and the concatenation operator (Concat).
  • the two input tensor data tensor left , Tensor right have corresponding split state sets S left and S right respectively .
  • the two branches will extend directly until the end of the traversal, which means that the entire network has more than one input data. This is usually combined in inference tasks. Uncommon.
  • two branches merge together at a certain operator.
  • split states that do not match each other may be selected.
  • operator A is a binary counter-addition operator
  • the backtracking process selected in the split state set of tensor left may be a state that has split only in the C dimension
  • the split in tensor right The selected state in the sub-state set may be a state that is split only in the H dimension.
  • the splitting methods of the addition operators represented by these two split states are inconsistent, which will cause the entire splitting plan P to be invalid .
  • the split state set corresponding to tensor left and tensor right contains only one split state, which ensures the certainty of the state selected in the two state set during the backtracking process. Therefore, in the forward traversal phase, when the output tensor data of the operator is used as input tensor data by at least two operators, or the operator has at least two output tensor data, the One split state remains in the set of split states of the output tensor data, and the split state is determined via the same state path of the operator.
  • each split state set of the input tensor data of the neural network model to the split state set of the output tensor data of the neural network model
  • the weight of the sub-path is determined by the sum of the weights of the corresponding state paths.
  • Set a threshold based on experience, and the weight of the split path is less than the set threshold, and the neural network model can be split as the target split path.
  • Step 204) Use the target splitting path to split the operator of the target layer of the neural network model.
  • accelerators usually further adjust the order of data in storage to improve the efficiency of memory access during calculation, which makes the work of modifying the output logic of the operator more difficult and cumbersome.
  • the calculation logic or splitting logic of the subsequent operator does not require the input data to ensure storage continuity in a certain dimension, the data output by the previous layer in the discrete storage state in this dimension can be directly used in the next layer Calculation without ensuring the continuity of output data.
  • the framework strips the task of adjusting the tensor data split form from the calculation task of the operator, and abstracts it into a new operator called the glue operator. This stripping avoids the output logic of each operator. Modifications enhance the portability of the framework to different accelerators at the bottom.
  • the glue operator is used to adjust the sub-blocks of the tensor divided in a certain way into sub-blocks formed in another way. As shown in Table 1, the splitting methods allowed by different types of operators are different in their input tensor data and output tensor data. When the splitting method of the output tensor data of the previous operator is not allowed by the next operator, it is necessary to use the glue operator to adjust the splitting method of the tensor data, so as to change the before and after operators. Glued together. In addition, even if the split mode of the output of the previous layer is supported by the next layer, the splitting of tensor data can be adjusted to a form that is more conducive to the calculation of the next layer through the glue operator.
  • this embodiment also proposes another neural network model splitting method. As shown in Figure 4, on the basis of Figure 2, it also includes:
  • Step 201' Insert a glue operator between the operator of the target layer and the associated split state set, and adjust the state in the split state set of the tensor data of the operator; wherein, the glue The operator is used to adjust the state obtained by the tensor data according to a splitting method to a state obtained by any splitting method.
  • the glue operator is used to express the behavior of adjusting the split state of tensor data.
  • the calculation scale of each layer of the neural network model keeps changing with the extension of the network, and with the splitting trend of the neural network model Changes require corresponding adjustments to the way the operator splits, that is, the state of intermediate results.
  • a glue operator is added between Op_2 and its input Tensor1 in Figure 3, which can convert any split state of tensor data into another split state.
  • its input tensor data and output tensor data have the same shape and the same state space.
  • Figure 5 shows that a glue operator is inserted between the operator and the corresponding input tensor data, or a glue operator can be inserted between the operator and the corresponding output tensor data, or even the operator Glue operators are inserted between the corresponding input tensor data and output tensor data.
  • a glue operator is inserted between the operator of the target layer of the neural network model and the associated split state set to obtain the directed acyclic graph of the neural network model including the glue operator;
  • the directed acyclic graph traverses the split state sets corresponding to all tensor data of the target layer, and determines the state paths between adjacent split state sets and the weights of the state paths; according to the weights of the state paths, Determine the split path of the target layer of the neural network model including the glue operator; use the split path of the target layer of the neural network model including the glue operator to perform each glue operator Select, delete the glue operators that do not need to be inserted, and keep the glue operators that need to be inserted.
  • the glue operator uses one of the four methods of split-splicing, splicing-split, splicing, and splitting.
  • the splicing stage it can be adjacent in any dimension.
  • any sub-data block can be split into two smaller sub-data blocks during the splitting stage.
  • Any kind of split can be transformed into another split form through such a two-stage process.
  • the splicing phase is skipped, and the corresponding split is performed in the splitting phase.
  • all data can be combined into a complete one-dimensional data in the splicing stage, and the corresponding split is performed in the splicing stage.
  • split-splicing or splicing-split as an example for the glue operator, suppose the total size of the tensor data to be adjusted is M, and the two stages cannot be skipped, and each stage must be spliced for 4 dimensions Or split.
  • splicing and splitting are usually implemented using the concatenation operator (Concat) and splitting operator (Slice) that come with the neural network algorithm. Because these two operators can only handle one dimension at a time, the entire glue In the worst case, it will bring about 8M storage read and write overhead. Therefore, it is necessary to find an optimal balance point between adjusting the split state and the additional overhead introduced, and then in the case of introducing as few glue operators as possible, it can conform to the rules of the network structure and adjust the operator in a reasonable place. Adjust the split method.
  • glue operators and ordinary neural network operators do the same processing.
  • Each glue operator adjusts the split state of tensor data with a corresponding time t, which is used as the weight of the corresponding state path .
  • Sub-path When selecting a glue operator, in the split path, check a split state corresponding to the input tensor data of each glue operator and a split state corresponding to the output tensor data.
  • the split state status_1 in the split state set Tensor_1 in Figure 5 is connected to the split state status_1 in the split state set Tensor_1' through a state path, and the two split states are the same. It shows that the split path P of the target layer of the neural network model does not need to adjust the split state of the input tensor data of the operator Op_2, and this is a result based on the consideration of the front and back operators and the overall performance.
  • the glue operator inserted between the operator Op_2 and the corresponding input tensor data will be removed from the network. Otherwise, it needs to be reserved for the inserted glue operator.
  • the implementation of the glue operator uses the original operator in the neural network model.
  • the splicing phase corresponds to the Concat operator in the neural network model
  • the split phase corresponds to the Slice operator in the neural network model. Any accelerator that already supports these two operators can quickly implement the glue operator.
  • the above method of obtaining the target splitting path is similar to the viterbi algorithm. This time it is only a partial list of examples, not an exhaustive list. Those skilled in the art should understand the essence of the technical solution of this application.
  • the convolution operator is a special operator. In some cases, additional auxiliary operators are required to complete the splitting task.
  • the convolution operator after the split will be calculated It appears that the box moves to the boundary and exceeds the boundary of the sub-tensor data, and the missing part of the data is located on the adjacent sub-tensor data.
  • the compensation operator is used to obtain the target data from the adjacent sub-tensor data except one sub-tensor data, and merge the target data and the sub-tensor data to form a larger data.
  • the moving range of the window in the calculation phase will not exceed the boundary of this compensated data block.
  • pooling operators and the currently not very common local response normalization operator (Local Response Normalization, LRN for short)
  • LRN Local Response Normalization
  • the calculation logic is, that is, to calculate To output the result of a point of the tensor data in the C dimension, you need to input the value of a point corresponding to the tensor data in the C dimension and the value of k/2 adjacent points. Therefore, if the calculation of the local response normalization operator is split into multiple LRN operators according to the C dimension, each new operator also needs the element data in the adjacent subtensor data to calculate the C dimension boundary value.
  • the data range in the input tensor data in the dimension required by the data can be summarized into three categories.
  • One type is point-to-point operators, that is, to calculate a data point of the output tensor data, only the value of one data point corresponding to the input tensor is required.
  • This type of operator includes activation operators (Relu, pRelu), batch Standardization operator (BatchNorm), and basic operators of bitwise addition, subtraction, multiplication and division (Add, Sub, Mult, Div). This type of operator can perform task splitting in any dimension, and the resulting sub-operator only needs the corresponding sub-tensor data as input in the calculation phase.
  • Another type of operator is a fully dependent type of operator, that is, to calculate a data point of the output tensor data, only all the data in the dimension of the input tensor data is required, for example, the convolution operator and the fully connected operator are calculating A point on the C dimension of the output tensor data needs to input all the points on the C dimension of the tensor data, although the split of the convolution operator in the input C dimension can be realized by accumulating the partial sum afterwards, when the operator is in C
  • the calculation logic on the dimension is more complicated, such as the normalized index regression operator (Softmax)
  • Softmax normalized index regression operator
  • I is the vector of the input tensor data in the normalized dimension
  • O is the vector of the output tensor data in the normalized dimension.
  • the compensation operator is actually used to deal with the third case between the point-to-point type operator and the fully dependent type operator, that is, the input tensor data is required to calculate a point on the output tensor data Data in the area near the corresponding location. The area near the corresponding position is determined according to the compensation parameters.
  • the operators can actually be split in the calculation logic, although they will rely on data other than the subtensor data, and the use of compensation operators can solve this problem uniformly.
  • FIG. 7 the third flowchart of a neural network model splitting method provided by an embodiment of the present disclosure.
  • Figure 2 it also includes:
  • Step 201" Insert a compensation operator between the operator of the target layer and the associated split state set, and adjust the state in the split state set of the input tensor data of the operator; wherein, the The compensation operator is used to obtain target data from adjacent sub-tensor data of any sub-tensor data in the state, and merge the target data with the sub-tensor data.
  • the framework introduces a compensation operator. Before the calculation starts, for a sub-tensor data set, elements of adjacent sub-tensor data are added around each sub-tensor data. This method avoids modifying the calculation logic of the split convolution operator or the pooling operator itself, so that the dependent behavior on the adjacent subtensor data is invisible to the convolution operator and the pooling operator itself. Conducive to the rapid implementation of this system, and to facilitate consistency on accelerators of different structures. However, the compensation operator itself will bring additional overhead.
  • the compensation operator introduces 2M memory access overhead without considering the overlap between the sub-tensor data after compensation.
  • Convolution operators and pooling operators are the main operators that make up the neural network, especially the image classification neural network.
  • the inserted compensation operators are combined in a pyramid structure.
  • the neural network model is a serial sequence composed of two convolution operators, and the two convolution operators are divided into four smaller tasks according to the H/W dimension.
  • Convolution operator, the N dimension and C dimension of the data are omitted in the figure.
  • the displacement step length is both 1.
  • the data width of the convolution operator Conv1 at the periphery of the sub-tensor data before calculation is k 1 /2, which ensures that the convolution kernel will not exceed the input sub-tensor during the calculation of the split convolution task. The boundary of the data.
  • the data width that the convolution operator Conv1 compensates at the periphery of the sub-tensor data before calculation is k 1 /2+k 2 /2, which makes the sub-tensor data of its output data Tensor1 have k 2 between each other
  • the width overlaps each other, so the convolution operator Conv2 does not need to perform data compensation on its input sub-tensor data before the calculation starts.
  • the non-split state of the output tensor data of the entire neural network model is the end state s end , and any state s of any piece of data in the neural network model has a corresponding split from s to s end
  • the weight of the path is l s .
  • the weight corresponding to s end is 0, and the weight corresponding to all states of all remaining tensor data is ⁇ .
  • each operator is traversed backwards.
  • the compensation operator if multiple compensation operators inserted in the neural network model are combined using a pyramid structure, one combined compensation operator can be obtained, or multiple combined compensation operators can be obtained. In this case, the number of compensation operators after merging is less than the number of compensation operators before merging.
  • each split state set of the input tensor data of the neural network model to the split state set of the output tensor data of the neural network model
  • the weight of the sub-path is determined by the sum of the weights of the corresponding state paths.
  • Set a threshold based on experience, and the weight of the split path is less than the set threshold, and the neural network model can be split as the target split path.
  • the fourth flowchart of a neural network model splitting method provided by an embodiment of the present disclosure. Both the glue operator and the compensation operator are introduced into the operator splitting scheme. At this time, the splitting methods include:
  • Step e Determine the target split path of the target layer according to the weight of the state path;
  • Step f Use the target splitting path to split the operator of the target layer of the neural network model.
  • a glue operator is inserted between each operator of the neural network model and its input tensor data, and a glue operator is split between the output tensor data of the neural network model and the operator that generates the output tensor data.
  • the state itself and the output state s root of the final output data from the split state of the data are stored in the state set.
  • Time, the value pair is represented by (s, t).
  • the state set S root corresponding to the output tensor data of the entire neural network model has the undivided state of the data and the corresponding minimum time (s root , 0), and all other sets are empty.
  • all operators in the neural network model are given a topological order ⁇ according to their mutual dependence.
  • the topological order needs to satisfy: For any operator A, all operators that depend on A must be in The topological order is ranked after A, and all operators that A depends on must be ranked before A in the topological order.
  • the split state set of each operator of the neural network model is reversed.
  • the operators in the neural network model are traversed one by one in the order of inverse ⁇ .
  • the operator A with input and output of m is input tensor data u 1 ,..., u m , and output tensor data v 1 ,...,V n , the content of the technical solution about operator splitting in the neural network model shown in Figure 2 is applicable to the technical solution shown in Figure 9, which is shown in Figure 4 in the neural network model with the glue operator inserted
  • the content of the technical solution about the glue operator is applicable to the technical solution shown in FIG.
  • the time complexity of reverse traversal is O(NM 2 ), where N is the number of operators in the neural network model, and M represents the number of states in the largest set of split states in the set of split states of all tensor data .
  • the content of the operator splitting technical solution shown in Figure 2 is applicable to the technical solution shown in Figure 9, and the glue operator-based operator splitting technical solution shown in Figure 4 is based on the glue operator.
  • the content is applicable to the technical solution shown in FIG. 9, and the content of the compensation operator in the compensation operator-based operator splitting technical solution shown in FIG. 7 is applicable to the technical solution shown in FIG. 9, and will not be repeated here.
  • the operator can be reasonably adjusted according to the type and scale of the operator and the calculation throughput rate and memory access bandwidth of the underlying hardware.
  • the method of splitting achieves a good balance between the computing efficiency of the hardware core and the splitting degree of the operator itself. At the same time, it will also take into account the coordination between the context operators in the splitting, and plan multiple operators as a whole Split selection.
  • the present disclosure provides a neural network model splitting device, which includes:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the target split path module includes:
  • the first traversal unit is used to traverse all the split state sets of the target layer, traverse each state for the current split state set, and obtain all state paths pointing to the current state and the starting state of the state path to all The split path of the initial state of the input tensor data of the target layer;
  • the first split path determining unit is configured to determine a split path from the current state to the initial state of the input tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein , The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the first selection target split path unit is used to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer The target split path between the set of split states.
  • the target split path module includes:
  • the second traversal unit is used to traverse all the split state sets of the target layer, for the current split state set, traverse each state, and obtain all state paths starting from the current state and the end state of the state path to The split path of the end state of the output tensor data of the target layer;
  • the second split path determination unit is configured to determine a split path from the current state to the end state of the output tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein, The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the second selection target split path unit is used to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer The target split path between the set of split states.
  • it also includes:
  • the first split state set optimization module is used for the forward traversal phase, when the output tensor data of the operator is used as input tensor data by at least two operators, or the operator has at least two output tensor data
  • one split state is reserved in the set of split states of the output tensor data of the operator, and the split state is determined via the same state path of the operator.
  • it also includes:
  • the second split state set optimization module is used to reserve one split state set of the input tensor data of the operator in the reverse traversal phase when the operator has at least two input tensor data
  • the split state is determined via the same state path of the operator.
  • FIG. 10 a schematic diagram of a neural network model splitting hardware device proposed in an embodiment of this application. It includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor, and the processor implements the neural network model splitting method described above when the processor executes the computer program.
  • a neural network model splitting method and related products are also proposed.
  • the descriptions of the related splitting methods and related products are consistent with the descriptions of the above embodiments. , Not repeat them here.
  • the relevant description of the neural network model splitting device is as follows.
  • the present disclosure provides a neural network model splitting device, which includes:
  • the split state set determination module is used to determine the split state set of the tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the target layer. At least one layer in the neural network model;
  • Insert a glue operator module which is used to insert a glue operator between the operator of the target layer and the associated split state set to adjust the state in the split state set of the tensor data of the operator; wherein, The glue operator is used to adjust the state obtained by the tensor data according to a splitting method to a state obtained by any splitting method;
  • the state path determination module is configured to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state The path represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path determination module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the target split path determination module includes:
  • the first traversal unit is used to traverse all the split state sets of the target layer, traverse each state for the current split state set, and obtain all state paths pointing to the current state and the starting state of the state path to all The split path of the initial state of the input tensor data of the target layer;
  • the first split path determining unit is configured to determine a split path from the current state to the initial state of the input tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein , The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the first selection target split path unit is used to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer The target split path between the set of split states.
  • the target split path determination module includes:
  • the second traversal unit is used to traverse all the split state sets of the target layer, for the current split state set, traverse each state, and obtain all state paths starting from the current state and the end state of the state path to The split path of the end state of the output tensor data of the target layer;
  • the second split path determination unit is configured to determine a split path from the current state to the end state of the output tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein, The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the second selection target split path unit is used to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer The target split path between the set of split states.
  • the insert glue operator module includes:
  • An inserting unit for inserting a glue operator between the operator of the target layer and the associated set of split states to obtain the directed acyclic graph of the neural network model including the glue operator;
  • the state path unit is configured to traverse the split state sets corresponding to all tensor data of the target layer according to the directed acyclic graph, and determine the state paths between adjacent split state sets and the weight of the state paths;
  • the target split path unit is used to determine the target split path of the target layer of the neural network model including the glue operator according to the weight of the state path;
  • the selection unit is configured to use the target split path of the target layer of the neural network model including the glue operator to select each glue operator inserted, delete the glue operator that does not need to be inserted, and delete the glue operator that needs to be inserted. The glue operator is retained.
  • the glue operator inserted by the glue operator module is used to splice the states in the split state set of the input tensor data of the glue operator.
  • the glue operator inserted by the glue operator module is used to split the state in the split state set of the input tensor data of the glue operator.
  • the glue operator inserted by the glue operator module is used to splice the states in the split state set of the input tensor data of the glue operator, and then merge the states in the split state set after the splicing process.
  • the state is split.
  • the glue operator inserted by the glue operator module is used to split the state in the split state set of the input tensor data of the glue operator, and then the split state set after the split processing In the state to splice.
  • it also includes:
  • the first split state set optimization module is used for the forward traversal phase, when the output tensor data of the operator is used as input tensor data by at least two operators, or the operator has at least two output tensor data
  • one split state is reserved in the set of split states of the output tensor data of the operator, and the split state is determined via the same state path of the operator.
  • it also includes:
  • the second split state set optimization module is used to reserve one split state set of the input tensor data of the operator in the reverse traversal phase when the operator has at least two input tensor data
  • the split state is determined via the same state path of the operator.
  • a neural network model splitting method and related products are also proposed.
  • the descriptions of the related splitting methods and related products are consistent with the descriptions of the above embodiments. , Not repeat them here.
  • the relevant description of the neural network model splitting device is as follows.
  • the present disclosure provides a neural network model splitting device, which includes:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • Inserting a compensation operator module for inserting a compensation operator between the operator of the target layer and the associated set of split states, and adjust the state in the set of split states of the input tensor data of the operator; wherein , The compensation operator is used to obtain target data from adjacent sub-tensor data of any sub-tensor data in the state, and merge the target data with the sub-tensor data;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the insertion compensation operator module includes:
  • the inserting unit is used to insert a compensation operator between a specific type of operator in the target layer and the associated set of split states of input tensor data; wherein the characteristic of the specific type of operator is that it is used for Calculating the element of the input tensor data corresponding to the element of the output tensor data of this type of operator is also used to calculate the adjacent elements of the element of the output tensor data.
  • the specific types of operators to which the compensation operators inserted by the insertion unit are applicable are convolution operators, pooling operators, and partial response normalization operators.
  • the insertion compensation operator module further includes:
  • the merging unit is used for merging multiple compensation operators in the target layer in a pyramid structure.
  • the target split path determination module includes:
  • the traversal unit is used to traverse all the split state sets of the target layer, traverse each state for the current split state set, and obtain all state paths starting from the current state and the end state of the state path to the The split path of the end state of the output tensor data of the target layer;
  • a split path determining unit configured to determine a split path from the current state to the end state of the output tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein, the The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the target split path unit is selected to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer.
  • the target split path between sub-state sets.
  • the neural network model splitting device further includes:
  • the first split state set optimization module is used for the forward traversal phase, when the output tensor data of the operator is used as input tensor data by at least two operators, or the operator has at least two output tensor data
  • one split state is reserved in the set of split states of the output tensor data of the operator, and the split state is determined via the same state path of the operator.
  • the neural network model splitting device further includes:
  • the second split state set optimization module is used to reserve one split state set of the input tensor data of the operator in the reverse traversal phase when the operator has at least two input tensor data
  • the split state is determined via the same state path of the operator.
  • a neural network model splitting method and related products are also proposed.
  • the descriptions of the related splitting methods and related products are consistent with the descriptions of the above embodiments. , Not repeat them here.
  • the relevant description of the neural network model splitting device is as follows.
  • the present disclosure proposes a neural network model splitting device, which includes:
  • the split state set module is used to determine the split state set of tensor data associated with the operator of the target layer according to the operator of the target layer in the neural network model; wherein, the target layer is the At least one layer in the neural network model;
  • Insert a glue operator module which is used to insert a glue operator between the operator of the target layer and the associated split state set to adjust the state in the split state set of the tensor data of the operator; wherein, The glue operator is used to adjust the state obtained by the tensor data according to a splitting method to a state obtained by any splitting method;
  • Inserting a compensation operator module for inserting a compensation operator between the operator of the target layer and the associated set of split states, and adjust the state in the set of split states of the input tensor data of the operator; wherein , The compensation operator is used to obtain target data from adjacent sub-tensor data of any sub-tensor data in the state, and merge the target data with the sub-tensor data;
  • the state path module is used to traverse the split state set according to the directed acyclic graph of the neural network model, and determine the state path between adjacent split state sets and the weight of the state path; wherein, the state path Represents the split mode of the operator; each state in the split state set represents a sub-tensor data set, and the union result of all sub-tensor data of the state is the tensor data;
  • the target split path module is configured to determine the target split path of the target layer according to the weight of the state path;
  • the splitting module is used to split the operator of the target layer of the neural network model by using the target splitting path.
  • the insert glue operator module includes:
  • the first insertion unit is configured to insert a glue operator between the operator of the target layer and the associated set of split states to obtain a directed acyclic graph of a neural network model including the glue operator;
  • the state path unit is configured to traverse the split state sets corresponding to all tensor data of the target layer according to the directed acyclic graph, and determine the state paths between adjacent split state sets and the weight of the state paths;
  • a first determining target split path unit configured to determine the target split path of the target layer of the neural network model including the glue operator according to the weight of the state path;
  • the selection unit is configured to use the target split path of the target layer of the neural network model including the glue operator to select each glue operator inserted, delete the glue operator that does not need to be inserted, and delete the glue operator that needs to be inserted. The glue operator is retained.
  • the glue operator inserted by the glue operator module is used to splice the states in the split state set of the input tensor data of the glue operator.
  • the glue operator inserted by the glue operator module is used to split the state in the split state set of the input tensor data of the glue operator.
  • the glue operator inserted by the glue operator module is used to splice the states in the split state set of the input tensor data of the glue operator, and then merge the states in the split state set after the splicing process.
  • the state is split.
  • the glue operator inserted by the glue operator module is used to split the state in the split state set of the input tensor data of the glue operator, and then the split state set after the split processing In the state to splice.
  • the insertion compensation operator module includes:
  • the second insertion unit is used to insert a compensation operator between the specific type of operator in the target layer and the associated set of split states of input tensor data; wherein the characteristic of the specific type of operator is:
  • the element of the input tensor data corresponding to the element of the output tensor data used to calculate this type of operator is also used to calculate the adjacent elements of the element of the output tensor data.
  • the specific types of operators applicable to the compensation operator inserted by the second insertion unit are convolution operators, pooling operators, and partial response normalization operators.
  • the insertion compensation operator module further includes:
  • the merging unit is used for merging multiple compensation operators in the target layer in a pyramid structure.
  • the target split path determination module includes:
  • the traversal unit is used to traverse all the split state sets of the target layer, traverse each state for the current split state set, and obtain all state paths starting from the current state and the end state of the state path to the The split path of the end state of the output tensor data of the target layer;
  • a split path determining unit configured to determine a split path from the current state to the end state of the output tensor data of the target layer according to the weight of the state path and the weight of the split path; wherein, the The weight of the split path is determined according to the weights of all state paths corresponding to the split path;
  • the second determining target split path unit is used to obtain the split state set of the input tensor data of the target layer and the output tensor data of the target layer after traversing all the split state sets of the target layer The target split path between the set of split states.
  • it also includes:
  • the first split state set optimization module is used for the forward traversal phase, when the output tensor data of the operator is used as input tensor data by at least two operators, or the operator has at least two output tensor data
  • one split state is reserved in the set of split states of the output tensor data of the operator, and the split state is determined via the same state path of the operator.
  • it also includes:
  • the second split state set optimization module is used to reserve one split state set of the input tensor data of the operator in the reverse traversal phase when the operator has at least two input tensor data
  • the split state is determined via the same state path of the operator.
  • the memory may include a physical device for storing information, which is usually digitized and then stored in a medium using electrical, magnetic, or optical methods.
  • the memory described in this embodiment may also include: devices that use electric energy to store information, such as RAM, ROM, etc.; devices that use magnetic energy to store information, such as hard disks, floppy disks, magnetic tapes, magnetic core memories, bubble memory, U disks ; A device that uses optical means to store information, such as CD or DVD.
  • devices that use electric energy to store information such as RAM, ROM, etc.
  • devices that use magnetic energy to store information such as hard disks, floppy disks, magnetic tapes, magnetic core memories, bubble memory, U disks
  • a device that uses optical means to store information such as CD or DVD.
  • quantum memory graphene memory and so on.
  • the processor can be implemented in any suitable manner.
  • the processor may take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (for example, software or firmware) executable by the (micro)processor, logic gates, switches, special-purpose integrated Circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller and embedded microcontroller form, etc.
  • program codes for example, software or firmware
  • the embodiment of the present application also provides a readable storage medium on which a computer program is stored, and when the computer program is executed, the neural network model splitting method described above is realized.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network perform tasks.
  • program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Nitrogen Condensed Heterocyclic Rings (AREA)
  • Automatic Tape Cassette Changers (AREA)
  • Image Analysis (AREA)

Abstract

本公开披露一种神经网络模型的拆分方法及相关产品,本方案把一个算子拆分成多个规模更小的子算子,这样可以直接调用单核架构下的计算库,避免了重新实现的额外工作量。

Description

神经网络模型拆分方法、装置、计算机设备和存储介质
相关申请
本申请要求2019年02月14日申请的申请号为201910114927.0,名称为“一种神经网络模型的拆分方法及相关产品”的中国专利申请的优先权,2019年02月14日申请的申请号为201910114967.5,名称为“一种神经网络模型的拆分方法及相关产品”的中国专利申请的优先权,2019年02月14日申请的申请号为201910115130.2,名称为“一种神经网络模型的拆分方法及相关产品”的中国专利申请的优先权,以及2019年02月14日申请的申请号为201910115162.2,名称为“一种神经网络模型的拆分方法及相关产品”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种神经网络模型的拆分方法及相关产品。
背景技术
近年来,深度学习加速器被不断提出,并如同通用处理器一样,正在由单核向多核扩展。这种扩展后的多核结构可以在训练阶段支持数据并行的方式来提高数据吞吐量,加快训练速度。然而,在推理阶段,相比吞吐量深度神经网络对端到端的时延有着更高的要求,这往往决定了加速器在某个场景下的可用性。传统的数据并行方案不能满足推理场景下对加速器小数据、低延迟的要求。
发明内容
基于此,有必要针对上述技术问题,提出一种神经网络模型的拆分方法及相关产品。
为实现上述目的,本公开提供一种神经网络模型拆分方法,其中,所述方法包括:
根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
根据所述状态路径的权重,确定所述目标层的目标拆分路径;
利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开提供一种神经网络模型拆分装置,其中,包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开还提供为实现上述目的,本公开提供一种神经网络模型拆分方法,所述方法包括:
根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
根据所述状态路径的权重,确定所述目标层的目标拆分路径;
利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开提供一种神经网络模型拆分装置,所述装置包括:
拆分状态集合确定模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其 中,所述目标层为所述神经网络模型中的至少一层;
插入胶水算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
状态路径确定模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径确定模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开还提供一种神经网络模型拆分方法,其中,所述方法包括:
根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述 状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
根据所述状态路径的权重,确定所述目标层的目标拆分路径;
利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开提供一种神经网络模型拆分装置,所述装置包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
插入补偿算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开还提供一种神经网络模型拆分方法,所述方法包括:
根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
根据所述状态路径的权重,确定所述目标层的目标拆分路径;
利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
为实现上述目的,本公开提供一种神经网络模型拆分装置,包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
插入胶水算子模块,用于在所述目标层的算子与关联的拆分状态 集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
插入补偿算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
本公开的技术方案可以用较小地开销实现深度学习加速器由单核向多核结构上的扩展,并且能够针对给定的网络和底层加速器特点给出一种高效的拆分方案,该方案能有效降低各种网络在多核加速器上的端到端时延。
附图说明
图1为共享存储多核结构的示意图;
图2为本公开实施例提供的一种神经网络模型拆分方法流程图之一;
图3为串行神经网络模型拆分示意图;
图4为本公开实施例提供的一种神经网络模型拆分方法流程图之二;
图5为算子与输入张量数据之间引入胶水算子的拆分示意图;
图6为补偿示意图;
图7为本公开实施例提供的一种神经网络模型拆分方法流程图之三;
图8为金字塔拆分示意图;
图9为本公开实施例提供的一种神经网络模型拆分方法流程图之四;
图10为本公开实施例提供的一种神经网络模型拆分硬件设备示意图。
具体实施方式
下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述,更加全面地说明本公开的示例实施例和它们的多种特征及有利细节。应注意的是,图中示出的特征不是必须按照比例绘制。本公开省略了已知材料、组件和工艺技术的描述,从而不使本公开的示例实施例模糊。所给出的示例仅旨在有利于理解本公开示例实施例的实施,以及进一步使本领域技术人员能够实施示例实施例。因而,这些示例不应被理解为对本公开的实施例的范围的限制。
除非另外特别定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。此外,在本公开各个实施例中,相同或类似的参考标号表示相同或类似的构件。
下面结合附图,对本公开实施例提供的一种神经网络模型拆分方法及相关产品的具体实施方式进行详细说明。
近年来,得益于深度学习本身在多个领域取得的巨大成功,深度学习加速器成为了快速发展的一个领域。这些新出现的加速器往往在性能功耗比上对比GPU有较大的优势。如同通用处理器的发展一样,深度学习加速器同样可以由单核向多核架构扩展,并且这种扩展非常适合深度学习中数据并行的训练方式。数据并行指的是通过把训练的数据集划分成若干份,使用多个处理核分别处理一部分子数据集来加快训练。在多核结构中采用这种方式,每个核并行地处理训练数据中的不同数据集,从而提高整个系统的吞吐量,加快训练速度。因此,多核的加速器架构在保持了每个核良好的性能功耗比的前提下,又可以方便地扩展整个系统的在训练阶段的计算吞吐量。
针对多核处理器结构芯片来说,如图1所示,这种共享存储多核结构是一种经典的多核结构。这种结构非常适合数据并行的神经网络训练方法。每个核可以作为数据并行中的一个处理器,分别读取不同的数据然后并行完成神经网络模型的正反向计算。每个核在计算阶段仍能够保持其在之前单核架构下良好的性能功耗比,与此同时整个系统的吞吐量也可以随着核数的扩展而增加。数据并行的问题在于,其扩展性依赖于处理的数据批量的大小。尽管在训练阶段这通常不会是一个问题,但是对于推理阶段这个前提则难以保证。一般来说,用于实时服务领域(包括视频监控,自动驾驶等)的神经网络模型,处理的数据通常是以流的方式串行输入,导致了每次处理的数据规模很小甚至往往是单张图片。在这种情况下,数据并行不能提供任何并行度,所有的工作任务会集中在单个核上,这使得多核带来的计算资源不能转化成处理任务的速度。
当在线下使用数据集完成了神经网络模型的训练后,就会把模型部署到云端的服务器上来处理外界发来的数据,此时的应用场景就由离线训练变成了在线推理。在在线推理阶段,一个非常重要的指标是时延,也就是从服务器收到待处理数据到返回处理后的结果的时间,进一步来说,是使用神经网络模型处理数据的时间。低时延保证云端服务器能够对客户端发来的数据在最短的时间内做出响应,在一些更 加敏感的场景下,直接决定了方案是否可用。因此,在线推理阶段对于加速器的要求就由处理大批量数据、高吞吐量转变为处理小批量数据、低时延。
在这种情况下,传统的数据并行或者模型并行难以有效降低推理任务的时延。对于数据并行来说,大批量数据是前提,这本身与在线推理小批量数据的特点矛盾。对于模型并行来说,它通常是为了解决一个规模很大的神经网络模型超过了单个设备的内存限制而采用的方法,把算子分配到不同的核上并不能降低网络的时延。为了真正能够在多核加速器上降低推理任务的时延,必须寻找一种方法,能够把对小批量数据甚至单个数据的推理计算任务合理地分配到多核架构的各个核上,保证每一时刻都有尽可能多的核参与计算,才能充分利用多核架构的资源。一种方法是把神经网络中的每个算子的计算任务都拆分到多个核上计算,这种方法即使在处理单张图片的推理任务时也能保证每一时刻都有多个核参与计算,从而达到了利用多核资源降低时延的目的。
但是,对于多核加速器来说,还有很多要解决的问题。首先,深度学习加速器通过定制化自身的硬件设计来适配深度学习算法本身的数据并行特征,提高计算吞吐量,加速器往往需要足够的数据规模才能达到较高的计算效率,而算子内的进一步拆分会减小每个核上的计算规模。当拆分达到一定粒度,每个核上计算效率的损失会超过拆分增加并行度所带来的收益。因此,必须在拆分并行和计算效率之间,在保证足够计算效率的同时提供足够的并行度。
另一方面,神经网络模型可以看作是一个由通常数以百计甚至千记的算子所构成的复杂计算图。不同种类的算子内的算法逻辑各不相同,这就导致对这些算子进行拆分的方法也不一样。每个算子的拆分,除了平衡自身的计算效率和并行度,还要考虑和前后算子的搭配,甚至于对全局的影响。深度学习的快速发展带来的是越来越多的大规模复杂网络,通过手动方式寻找一种好的并行方法是不现实的,因此需要一种自动化的方法来保证来对于不同的网络都能够给出一种较好 的拆分并行策略。
此外,还需要考虑的是对于底层加速器的可移植性。对于没有足够良好的可编程性的加速器来说,由单核扩展到多核,并且实现算子内部的拆分并行所带来的修改软件栈的工作量是非常大的。传统的数据并行和模型并行的实现仍然是基于一个处理核完成一个算子的计算任务,所以并不会带来很多额外的工作,而单个算子的跨核并行需要对算子本身实现进行修改,这种修改的难易程度依赖于加速器的可编程性和原有算子实现逻辑的复杂程度。如何减小在多核架构上实现低时延推理过程中的额外开销,缓解实现过程中工作量对于加速器本身可编程性的依赖,使得方法能够在未来对于不同的多核加速器都有一定的通用性也是一个需要考虑的问题。
基于上述分析描述,针对大规模的神经网络模型自动给出一套端到端的拆分方案,本方案把一个算子拆分成多个规模更小的子算子,这样可以直接调用单核架构下的计算库,避免了重新实现的额外工作量。比如:一个激活算子在经过拆分后可以得到许多更小的激活算子,这意味着只需要在多个核上调用原有的单核激活函数完成每个子任务,而不需要修改或者重新实现一个多核版本的激活函数。在这个过程中,既需要兼顾每个算子本身的拆分后的计算效率和并行度,也要考虑上下文算子彼此之间在拆分上的相互配合。最终目标是得到一个能够有效降低整个神经网络模型端到端的推理时延的拆分并行方案。
以自动驾驶应用为例,车辆在自动行驶过程中需要对车载传感器传来的图像、视频、语音等外部信息进行分析处理。为了保证安全性,车辆必须在最短的时间内得到处理的结果,从而做出决策。采用了多核处理器结构芯片的车辆,可以使用本方案把神经网络模型处理小批量外部信息的计算负载均衡地分配到多个处理器核上,在规定的响应时间内完成对信息的处理,返回处理结果辅助车辆自动行驶。本技术方案可以用较小地开销实现深度学习加速器由单核向多核结构上的扩展,并且该方案能有效降低各种网络在多核加速器上的端到端时延。
在上述应用场景中,多核处理器结构芯片设置在车辆上。实际中,多核处理器结构芯片可以设置在云端服务器上,车辆可以通过3G/4G、WIFI等网络将车载传感器传来的图像、视频、语音等外部信息发生至云端服务器。云端服务器使用本方案把神经网络模型处理小批量外部信息的计算负载均衡地分配到多个处理核上。在车辆行驶规定的响应时间内,云端服务器将处理结果通过3G/4G、WIFI等网络反馈至车辆。在实际中,车载传感器采集到的外部信息的规模不同。在应用之前,根据不同规模的外部信息,车载处理器利用本方案确定相应的算子拆分路径。将不同规模的外部信息对应的算子拆分方案存储对应区域,多核处理器结构芯片获取外部信息后调出对应的算子拆分路径来对神经网络模型中的算子进行拆分,把外部信息的计算负载均衡地分配到多个处理器核上。
通常,上层框架需要调用计算库来得到神经网络模型中每个算子在处理器上的指令实现。具体地,框架把每个算子的类型、参数告知计算库,计算库返回每个算子在处理器上执行所需要的机器指令。框架通过驱动把数据和所述机器指令加载到处理器上,启动处理器并完成所述算子的计算。
如果把所述算子的计算平台由单核加速器变成有着相似甚至相同核结构的多核加速器,相应地,需要对计算库进行重新设计,使其能够产生运行在多个核上的机器指令。具体地,由于多个核需要读取同一输入张量数据的不同部分,同时也需要把各自的输出写回到同一输出张量数据的不同部分,计算库需要修改每个算子的计算指令中所有关于读取和存储部分的指令。
本公开实施例所提供的神经网络拆分方法能够尽量避免对单核处理器计算库进行修改,同时也能够实现神经网络模型在多核处理器上的并行执行。具体地,上层框架通过把神经网络模型中的算子拆分成若干个可以并行执行子算子,对每个子算子,框架调用计算库生成所述子算子在单个核上执行的机器指令,通过把所述子算子的机器指令加载到不同核上,实现算子在多核处理器上的并行计算。具体地, 因为框架使用单核处理器计算库生成子算子的计算指令,神经网络模型中所述算子的输入和输出张量数据随着所述算子被拆分成子算子同样被拆分成相应的子张量数据。
基于上述描述,如图2所示,为本公开实施例提供一种神经网络模型拆分方法流程图。包括:
步骤201):根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合。
在本实施例中,神经网络模型通常可以看作是一个由算子和多维张量数据所构成的有向无环图,算子和张量数据彼此之间通过有向边相互连接,有向边的指向表明了数据是算子的输入或者是输出。使用op表示算子,tensor表示张量数据。同时,为了统一不同算子的拆分方式的表达,框架统一选择使用与算子相关联的张量数据的拆分方式来说明不同算子的拆分方式。不妨假设网络中所有张量数据都是4维的,对于图像分类网络最后的全连接层和归一化指数回归层的输入数据或输出数据来说,实际维度不到4,仍然将其表示为4维张量。用符号N,C,H,W分别表示这4个维度。其中,N表示批量的大小,C表示特征图像的个数,H表示特征图像的高,W表示特征图像的宽。这种假设是仅仅是出于说明的便捷性,对于框架本身来说,可以支持含有任意维度数量的张量数据的神经网络模型的处理。尽管如此,4维对于相当大一部分的神经网络结构都足够使用。
本技术方案对神经网络模型中的算子进行拆分时,算子的种类不同,该算子支持的计算逻辑也不相同,同样有不同的拆分策略。为了能够统一地表达不同算子的拆分策略,本技术方案采用算子的输入张量数据、输出张量数据的拆分状态来表示算子本身计算逻辑的拆分。
对于本技术方案来说,可以对整个神经网络模型中的所有算子进行拆分,也可以对神经网络模型中部分算子进行拆分。且,目前深度学习领域新出现的网络结构和算法已经逐渐模糊各个数据维度的物理意义和彼此之间的界限,本技术方案可以扩展应用到更多维度下的算子拆分。
把张量数据的任意一种拆分称为该张量数据的一种状态s,将张量数据拆分后,获得子张量数据集合。状态s通过对应的子张量数据集合表征。所有可能的拆分{s 0,s 1,s 2,…}组成了该张量数据的拆分状态集合S,一般来说,这是一个非常大的状态空间,这意味着由张量数据的拆分状态所表示的算子的可能的拆分方式的空间也非常巨大。
通过一些合理的假设,可以对张量数据的状态集合进行剪枝。首先,多核加速器完成一个算子的计算的时延取决于执行子任务耗时最长的那个核的时间,而多核架构中各个核在硬件结构上彼此是相互对等的,因此每个核的时间耗费的长短取决于分配给该核的任务负载的多少。所以一个合理的假设是应该保证拆分后的子算子的规模大致均衡,为此,可以从张量数据的状态集合S中省去那些拆分上不均衡的拆分状态。此外,多核架构中核数通常是2的整数次幂,如1,2,4,8,16等等,一个并行度不是2的整数次幂的任务往往会导致核的调度上产生“碎片”,因此拆分后的子算子数量应当保证是2的整数次幂。通过这两个假设,可以极大地缩小算子拆分策略的搜索空间。
需要说明,并非选择任意与算子关联的张量数据的拆分状态都能表示该算子的一种有效的拆分方式。张量数据的拆分的维度应该被算子所支持,例如归一化指数回归算子(Softmax)的输入数据不应该在待归一化的维度上存在拆分。此外,算子的输入张量和输出张量的拆分应该满足算子的计算逻辑,例如,卷积算子的输出数据在H/W维度上拆分的每一个子块的起止点都应该确实是其对应的输入数据在H/W维度上拆分的子块根据卷积算子的卷积核和位移步长计算出来的;卷积算子的输入数据在C维度上的拆分应该和权值数据在C维度上的拆分完全一致,输出数据在C维度上的拆分应该和权值数据在N维度上的拆分完全一致。在框架中,使用输出状态根据每个算子的具体逻辑来反向推出算子的输入状态,或者使用输入状态根据每个算子的具体逻辑来正向推出算子的输出状态。这保证了相关数据的状态始终能够表示一个有效的算子拆分方式。
步骤202):根据所述神经网络模型的有向无环图遍历所述拆分 状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重。
如图3所示,整个神经网络模型的拆分方案P可以看作是由每个算子的输入张量数据的拆分状态集合中的一种拆分状态向输出张量中的一种拆分状态的跳转。前一个算子的输出张量的拆分状态即是后一个算子输入张量的拆分状态。经过算子的每一种可能的跳转对应了在该算子上的一种有效的拆分方式。因此,状态路径表示算子的拆分方式。
在本技术方案中,按照一种分解方式对张量数据进行分解获取子张量集合,该子张量集合对应的一种拆分状态,不同的分解方式能够获得多种拆分状态,所有分解方式获得的拆分状态组成拆分状态集合。由此可知,每种拆分状态对应一子张量集合,子张量集合包含了张量数据中的全部元素。并且,在一个子张量集合中,每个子张量的元素可以重叠,也可以不重叠。
上文已描述,状态路径表示算子的拆分方式,经所述状态路径对应的拆分方式对算子的计算逻辑进行拆分,获得对应的子算子集合。输入张量数据的状态和对应输出张量数据的状态通过状态路径连接,表示该输入张量数据的一拆分状态的子张量数据集合经过子算子集合中的子算子处理,得到该输出张量数据的对应拆分状态的子张量数据集合。
在本技术方案中,状态路径的权重为算子在某种拆分状态下在多核加速器上并行执行所用的时间进行表征,且多核加速器完成一个算子的计算的时间取决于执行子任务耗时最长的那个核的时间。计算状态路径的权重时使用参数进行估计:
1)拆分后的n个子算子的计算负载c 1,c 2,…,c n。其中,c i根据拆分后第i个子算子的类型和规模计算得到;
2)n个子算子的访存数据量d 1,d 2,…,d n。其中,d i根据拆分后第i个子算子的类型和规模计算得到;
3)每个加速器核的计算吞吐速率α。α由加速器本身的性能参数 所决定;
4)每个核的访存带宽β。通常来说,多个核共享有限的访存带宽,因此β=B/n。其中, B是多核加速器的总带宽。
状态路径的权重的计算公式为:
t=max i=1,...,n(max(c i/α,d i/β))     式(1)
其中,内侧的取最大值操作是基于算子实现的计算部分和访存部分之间能够相互隐藏,即计算和访存可以做到尽量并发执行。对于一些加速器来说,当子算子的规模过小时会导致每个核的计算吞吐量降低,可以对α进行进一步修正使估值更加准确。外侧的取最大值操作就是多核加速器完成一个算子的计算的时间取决于执行子任务耗时最长的那个核的时间。
需要说明的是,上述获取状态路径的权重的方式仅仅是例举的部分情况,而不是穷举,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,比如:衡量状态路径的权重不仅仅可以是执行子任务的所花费的时间,也可以是执行子任务的吞吐量。或也可以通过实际测量在多核处理器上执行所述状态路径对应的算子拆分方式下的所有子任务的时间来确定状态路径的权重。但只要其实现的功能以及达到的技术效果与本申请类似,那么均应当属于本申请的保护范围。
步骤203):根据所述状态路径的权重,确定所述目标层的目标拆分路径。
在步骤203中,利用状态路径的权重,分两种方式来确定目标层的拆分路径。第一种方式为正向遍历来确定拆分路径,步骤包括:
遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有指向当前状态的状态路径以及所述状态路径的起始状态到所述目标层的输入张量数据的起始状态的拆分路径;
根据所述状态路径和所述拆分路径确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;
根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
第二种方式为反向遍历来确定拆分路径,步骤包括:
遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
下面举例详细说明如何遍历完所述目标层的所有拆分状态集合后获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
图3所示的神经网络模型为串行的,且整个神经网络模型的输入张量数据和输出张量数据均是不拆分状态。将图3整个神经网络模型的算子进行拆分,把一个包含n个算子的串行神经网络模型描述成一个算子序列(op 1,op 2,...,op n),假设每个算子只有一个输入和一个输出,前一个算子的输入是后一个算子的输出,那么包括整个神经网络模型的输入张量数据和输出张量数据以及所有的算子之间的中间结果张量在内,所有的张量数据构成集合(tensor 0,tensor 1,...,tensor n),op i的输入是tensor i-1,输出是tensor i。对每个数据张量tensor i,有与之对应的状态集合 S i,搜索策略的目标是寻找一种张量本身和其状态集合的某个状态之间的映射关系tensor i→s i,通过给神经网络模型中每个张量数据确定一个具体的拆分状态,从而可以确定所有算子的拆分方式,因此,把一个神经网络模型中所有张量数据到其拆分状态的一种映射关系称为该网络模型的一种拆分方案P。在计算阶段,第i个算子op i以处于拆分状态s的输入数据计算出处于拆分状态r的输出张量数据,具体的并行计算方式由输入张量数据和输出张量数据的状态所决定,同时,该算子的计算时间记为t s→r,其值的大小取决于相应的拆分方式和底层加速器的硬件特性,则整个网络的时延T的计算公式为:
Figure PCTCN2020084416-appb-000001
其中s i-1∈S i-1,s i∈S i,        (2)
同样与之对应的还有使用该拆分方式在多核加速器上并行执行该算子所使用的时间t i,因此,t i可以被看作是由算子的输入张量数据的状态指向输出张量数据的状态的有向边的权重。同时,作为整个神经网络模型的输入张量数据和输出张量数据,它们对应的拆分状态空间中只有一种不拆分的保持整个数据块连续完整的状态,这使得神经网络模型的拆分方案P由完整的输入数据开始,到完整的输出数据结束,外部的用户始终看到一个完整的输入和输出。此时,对于给定的神经网络模型搜索一个好的拆分方案P,即是寻找一条由输入张量数据的不拆分状态到输出张量数据的不拆分状态的最短路径,该路径在每个中间结果张量的有效状态空间中都要选择一个状态经过。式3、式4给出了这种抽象的公式表示。
Figure PCTCN2020084416-appb-000002
Figure PCTCN2020084416-appb-000003
同样注意到图3中,存在输入张量数据的一个拆分状态指向输出 张量数据的多个拆分状态的情况,这种情况是存在的,进一步说,正是这种情况才导致了神经网络模型拆分空间的巨大。
在本技术方案中,设整个神经网络模型的输入张量数据的不拆分状态为起始状态s root,在初始阶段,神经网络模型的输入张量数据的不拆分状态为起始状态s root对应的拆分路径的权重为0,其余所有张量数据的所有状态的对应的拆分路径的权重为 。神经网络模型中任一张量数据的任一状态s都有与之对应的由s root开始到s的拆分路径权重为l s。由前往后访问每一个拆分状态集合,在每个拆分状态集合中,依次遍历其中的每一个状态s。对每个状态s,指向后一拆分状态集合中若干拆分状态的每一个有向边e 1,…,e ks。以后一拆分状态集合中的拆分状态v为例,使用式(1)获得状态s到状态v之间的权重t sv,利用式(5)来更新该状态路径指向的下一拆分状态集合中的状态v对应的由s root开始到状态v的拆分路径的权重l v
l v=min(l v,l s+t sv)         (5)
在依据神经网络模型的拓扑关系正向遍历完成所有拆分状态集合的访问后,获得整个神经网络模型的输入张量数据的不拆分状态s root到神经网络模型的输出张量数据的不拆状态s end的目标拆分路径。
上述描述一条由不拆分状态s root到不拆分状态s end经过每个拆分状态集合中的一个状态构成的路径,即为神经网络模型的拆分路径。从神经网络模型的拆分路径中选取权重最小的作为神经网络模型的目标拆分路径。
需要说明的是,图3所示的神经网络模型为串行神经网络模型,且为了便于说明本技术方案,神经网络模型的输入张量数据和输出张量数据对应的拆分状态集合均为不拆分状态。在神经网络模型的输出张量数据的拆分状态集合不是不拆分状态s end,而是多个拆分状态构成集合时,神经网络模型的输出张量数据的拆分状态集合中的每个拆分状态的拆分路径的权重中选出最小值作为整个神经网络模型的输入张量数据的拆分状态集合到神经网络模型的输出张量数据的拆分 状态集合之间的目标拆分路径。
注意到整个方案也可以转换为搜索由不拆分状态s end到不拆分状态s root的拆分路径,二者是等价的。同理,在神经网络模型的输入张量数据的拆分状态集合不是不拆分状态s end,而是多个拆分状态构成集合时,从神经网络模型的输入张量数据的拆分状态集合中的每个拆分状态的拆分路径的权重中选出最小值作为整个神经网络模型的输入张量数据的拆分状态集合到神经网络模型的输出张量数据的拆分状态集合之间的目标拆分路径。
图3所示的神经网络模型为串行的神经网络模型,且模型的输入张量数据和输出张量数据的拆分状态集合中的状态为不拆分情况。在实际应用中,针对图2、图4、图7和图9所示的技术方案,均需要解决多分支神经网络模型中不同分支拆分方式一致性的问题。位于分支交汇处的算子具有1个以上的输入张量数据,例如对位加法算子(Add),对位乘法算子(Mult),拼接算子(Concat)。对一个有2个输入的算子A,在访问该算子,即根据输入张量数据的拆分状态集合枚举输入张量数据的拆分状态集合结束后,两个输入张量数据tensor left、tensor right分别有对应的拆分状态集合S left和S right。分别沿tensor left、tensor right开始的两条之路继续向前遍历,一种情况下,两条支路会直接延伸直至遍历结束,代表整个网络有不止一个输入数据,这通常在推理任务中并不常见,另一种情况下,两条支路在某算子处合到一起。无论哪种情况,当确定拆分方案P时,在算子A的两个输入张量数据tensor left、tensor right上,可能会选中相互不匹配的拆分状态。具体来说,假设算子A是二元对位加法算子,回溯过程在tensor left的拆分状态集合中选中的可能是一个仅在C维度上有拆分的状态,而在tensor right的拆分状态集合中选中的可能是一个仅在H维度上有拆分的状态,这两个拆分状态所表示的加法算子本身的拆分方式是不一致的,因此会导致整个拆分方案P无效。为了解决这个问题,在遍历算子A结束前保证tensor left、tensor right对应的拆分状态集合中都只含有一个拆分状态,这确保回溯过程中在两状态集合中选择的状 态的确定性。因此,在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。这样,在遍历分支算子结束前,将从多个输入数据的拆分状态集合中选择出对应累计权重最小的状态保留下来,移除拆分状态集合中其他的拆分状态。
需要说明的是,上述获取目标拆分路径方式类似于viterbi算法,此次仅仅是例举的部分情况,而不是穷举,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,比如:神经网络模型的输入张量数据的拆分状态集合到神经网络模型的输出张量数据的拆分状态集合之间的每一条拆分路径的权重由对应的状态路径的权重之和确定。根据经验设置一阈值,拆分路径的权重小于设定的阈值,就可以作为目标拆分路径对神经网络模型进行拆分。但只要其实现的功能以及达到的技术效果与本申请类似,那么均应当属于本申请的保护范围。
步骤204):利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
由上述描述可知,通过把神经网络中的算子的计算逻辑拆分成更小的子任务分配到多个核上并行执行来充分利用多核处理器结构芯片的硬件资源。
对于图2所示的技术方案来说,最理想的条件下,希望拆分后的子算子可以把它们的输出张量数据写到一个存放完整输出张量数据的存储空间中的对应位置上,这样一个算子的所有子算子执行完毕后得到的始终是一个完整连续的数据块。但对于部分加速器来说,这一点并不容易做到。首先,由于拆分后的算子的输出张量数据在整个输出中的存储位置可能是不连续的,使得需要重写算子输出部分的代 码,使其能够把输出结果写回到一个子张量在存储中对应的离散位置上。与此同时,加速器通常会进一步调整数据在存储中的顺序来提升计算时的访存效率,这使得修改算子输出逻辑的工作更加困难繁琐。同时,如果后续算子的计算逻辑或拆分逻辑不需要输入数据在某个维度上保证存储连续性,可以把上一层输出的在该维度上处于离散存储状态的数据直接用于下一层的计算,而不需要保证输出数据的连续性。
因此,框架把调整张量数据拆分形式的任务从算子的计算任务中剥离出来,抽象为一个新的算子,称为胶水算子,这种剥离避免了对每个算子输出逻辑的修改,增强了框架对于底层不同加速器的可移植性。胶水算子用来把张量按照某种方式拆分的子数据块调整成按照另一种拆分方式形成的子数据块。如表1所示,不同种类的算子所允许的拆分方式表现到它们的输入张量数据、输出张量数据上是不同的。当上一层算子的输出张量数据的拆分方式并不被下一层算子所允许,就需要使用胶水算子对该张量数据的拆分方式进行调整,从而把前后算子“粘接”起来。此外,即使上一层输出的拆分方式被下一层所支持,也可以通过胶水算子把张量数据的拆分进行调整成更利于与下一层计算的形式。
表1
Operation 允许拆分的输入维度
Convolution N,C,H,W(H和W不应小于kernel)
FC N,C
Relu N,C,H,W
Scale N,C,H,W
BatchNorm N,C,H,W
Softmax 无法拆分要规范化的维度
Pooling N,C,H,W(H和W不应小于kernel)
基于上述描述,在图2的基础上,本实施例还提出另一种神经网 络模型的拆分方法。如图4所示,在图2的基础上,还包括:
步骤201’):在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态。
在本步骤中,通过胶水算子来表示调整张量数据的拆分状态的行为方式,神经网络模型的每一层的计算规模随着网络的延伸不断变化,随着神经网络模型拆分趋势的变化,需要对算子的拆分方式做出相应的调整,也就是对中间结果的状态进行调整。如图5所示,图3中的Op_2和其输入Tensor1之间加入了胶水算子,可以把张量数据的任意一种拆分状态转换成另一种拆分状态。对胶水算子而言,其输入张量数据和输出张量数据有着相同的形状和相同的状态空间,由输入张量数据的任一拆分状态,存在指向输出张量数据所有拆分状态的有向边,因此在输入张量数据的拆分状态集合和输出张量数据的拆分状态集合之间形成了全连接的网状结构。这使得任意一种输入张量数据的拆分状态可以在算子Op_2前转换成另一种拆分状态,给拆分方案P的搜索空间中引入了在每个算子计算开始前调整其输入张量数据的拆分状态,也即是调整算子本身拆分方式的可能性。
需要说明的是,图5示出了算子与对应的输入张量数据之间插入胶水算子,也可以在算子与对应的输出张量数据之间插入胶水算子,更可以在算子与对应的输入张量数据、输出张量数据之间均插入胶水算子,此次仅仅是例举的部分情况,而不是穷举,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,但只要其实现的功能以及达到的技术效果与本申请类似,那么均应当属于本申请的保护范围。
在神经网络模型的目标层的算子与关联的拆分状态集合之间插入胶水算子,虽然对算子的拆分方式作出相应的调整,但是这种调整又会带来额外的开销,如何在整个神经网络模型中恰当地插入胶水算子来提高神经网络模型的性能。为了解决这个问题,在神经网络模型 的目标层的算子与关联的拆分状态集合之间插入胶水算子,获取包含所述胶水算子在内的神经网络模型的有向无环图;根据所述有向无环图遍历所述目标层的所有张量数据对应的拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;根据所述状态路径的权重,确定包含所述胶水算子在内的神经网络模型的目标层的拆分路径;利用包含所述胶水算子在内的神经网络模型的目标层的拆分路径对插入的每个胶水算子进行选择,对不需要插入的胶水算子删除,对需要插入的胶水算子保留。
对于胶水算子来说,胶水算子内部采用拆分-拼接、拼接-拆分、拼接、拆分这四种方式中的其中之一种实现方式,在拼接阶段可以把在任意维度上相邻的子数据块凭借成一个新的数据块,在拆分阶段,可以把任意一个子数据块拆分成两个更小的子数据块。任意一种拆分可以通过这样一个两阶段的过程转换成另外一种拆分形式。为了说明这一点,不妨假设数据本身是一维的,调整前的拆分形式表示为{(0,p 1),(p 1,p 2),…,(p n-1,end)},每一段代表一维数据拆分后的一个子段,调整后的拆分形式是{(0,q 1),(q 1,q 2),…,(q m-1,end)},如果调整前的某相邻两段(p i-1,p i),(p i,p i+1)是调整后的某一段(q j,q j+1),即p i-1=q j,p i+1=q j+1,在调整这一部分时只需要在拼接阶段把(p i-1,p i),(p i,p i+1)拼接在一起,跳过拆分阶段。同样另一种情况下,如果调整前的某一子段是调整后的若干子段的集合,则跳过拼接阶段,在拆分阶段执行相应的拆分。最坏情况下,可以把拼接阶段把所有数据组合成一个完整一维数据,在拼接阶段在进行对应的拆分。
以胶水算子采用拆分-拼接或拼接-拆分为例,假设待调整张量数据的总大小为M,且两个阶段均不能跳过,且每个阶段都要针对4个维度进行拼接或者拆分。为了便于移植,拼接和拆分通常会使用神经网络算法中自带的拼接算子(Concat)和拆分算子(Slice)实现,由于这两个算子每次只能处理一个维度,整个胶水在最差情况下会带来8M的存储读写开销。所以,必须在调整拆分状态和引入的额外开销之间寻找一个最佳的平衡点,再在引入尽量少的胶水算子的情况下 又能符合网络结构的规律在合理的地方对算子的拆分方式进行调整。
进一步详说,胶水算子和普通的神经网络算子做相同处理,每个胶水算子对张量数据拆分状态的调整都有与之对应的时间t,该时间作为相应的状态路径的权重。仍然利用公式(5)获得包括胶水算子在内的神经网络模型的目标层的输入张量数据的不拆分状态s root到神经网络模型的输出张量数据的不拆状态s end的目标拆分路径。选择胶水算子时,在拆分路径中,检查每个胶水算子的输入张量数据对应的一拆分状态及输出张量数据对应的一拆分状态,如果该两个拆分状态相同,即图5中的拆分状态集合Tensor_1中的拆分状态status_1通过状态路径与拆分状态集合Tensor_1’中的拆分状态status_1相连,这两个拆分状态相同。说明神经网络模型的目标层的拆分路径P并不需要调整算子Op_2的输入张量数据的拆分状态,并且这是一个基于前后算子和整体性能统筹考虑的结果。算子Op_2与对应输入张量数据之间插入的胶水算子会从网络中移除。否则,需要对插入的胶水算子保留。
需要说明的是,胶水算子的实现使用神经网络模型中原有的算子。拼接阶段对应的是神经网络模型中的Concat算子,而拆分阶段对应的是神经网络模型中的Slice算子,任何已经支持这两种算子的加速器都可以很快地实现胶水算子。并且,在本实施例中,上述获取目标拆分路径方式类似于viterbi算法,此次仅仅是例举的部分情况,而不是穷举,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,比如:神经网络模型的输入张量数据的拆分状态集合到神经网络模型的输出张量数据的拆分状态集合之间的每一条拆分路径的权重由对应的状态路径的权重之和确定。根据经验设置一阈值,拆分路径的权重小于设定的阈值,就可以作为目标拆分路径对神经网络模型进行拆分。但只要其实现的功能以及达到的技术效果与本申请类似,那么均应当属于本申请的保护范围。
需要强调的是,图2所示的关于算子拆分的技术方案内容适用于 图4所示的技术方案,这里不再赘述。
对于神经网络模型来说,卷积算子是比较特殊的算子,在一些情况下需要额外的辅助算子完成拆分任务。当把计算按照输入张量数据的H/W维度进行划分时,如果卷积核窗口的尺寸超其每次移动的步长,即kernel>stride,拆分后的卷积算子在计算时会出现框移动到边界处超过子张量数据的边界的情况,而缺少的这一部分数据位于相邻子张量数据上。为了处理这种子任务之间的输入张量数据相互重叠的情况,同时保证可移植性,同样把这种需要访问相邻子张量数据的边界数据的行为剥离出来,形成一个新的辅助算子,称为补偿算子。
如图6所示,补偿算子用来把一个子张量数据之外的相邻子张量数据中获取目标数据,将目标数据和子张量数据合并在一起形成一块更大的数据,此时计算阶段窗口的移动范围不会超过这个经过补偿的数据块的边界。除了卷积算子之外,池化算子以及目前不是很常见的局部响应归一化算子(Local Response Normalization,简称LRN),同样有这种拆分后的子任务依赖相邻数据块上数据的问题,池化算子与卷积算子类似,主要是由于池化窗口大于窗口本身的移动步长所引起的,而局部响应归一化算子不同,其计算逻辑是,即为了计算输出张量数据在C维度上的一个点的结果,需要输入张量数据在C维度上与之对应的一个点以及左右相邻的k/2个点的数值。因此,如果把局部响应归一化算子的计算按照C维度拆分成多个LRN算子,每个新算子同样需要相邻子张量数据中的元素数据来计算位于C维度边界上的值。
根据算子计算输出张量数据在某个维度上的一个数据所需要的在输入张量数据在该维度上的数据的范围,可以把算子归纳为三类。一类是点对点的算子,即计算输出张量数据的一个数据点只需要输入张量上与之对应的一个数据点的值,这一类算子包括激活算子(Relu,pRelu),批标准化算子(BatchNorm),以及对位加减乘除的基本算子(Add,Sub,Mult,Div)。这类算子在任何维度上可以进行任务拆分,得到的子算子在计算阶段只需要对应的子张量数据作为输入。另 一类算子是全依赖类型的算子,即计算输出张量数据的一个数据点只需要输入张量数据上在该维度上的所有数据,例如卷积算子和全连接算子在计算输出张量数据C维度上的一个点需要输入张量数据C维度上的所有点,尽管可以通过事后累加部分和的方式实现卷积算子在输入C维度上的拆分,当算子在C维度上的计算逻辑更加复杂时,例如归一化指数回归算子(Softmax),式(6)给出在其归一化维度上的计算公式。
Figure PCTCN2020084416-appb-000004
其中,I是输入张量数据在归一化维度上的向量,O是输出张量数据在归一化维度上的向量。不同于卷积的部分和累加,这里的计算逻辑较为复杂,很难实现拆分。从这个角度看来,补偿算子实际是用来处理位于点对点类型的算子和全依赖类型的算子之间的第三种情况,即计算输出张量数据上的一个点需要输入张量数据在对应位置附近区域内的数据。对应位置附近区域根据补偿参数确定。这种情况下,算子在计算逻辑上实际上仍然是可以拆分的,尽管它们会依赖与子张量数据之外的数据,使用补偿算子可以统一地解决这个问题。
基于此,如图7所示,为本公开实施例提供的一种神经网络模型拆分方法流程图之三。在图2的基础上,还包括:
步骤201”):在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并。
在本技术方案中,为了解决卷积算子、池化算子由于窗口小于位移步长导致在沿H/W维度进行任务拆分的情况下,窗口超过输入子张量数据的边界的问题,框架引入了补偿算子在计算开始前,对一子张量数据集合来说,在每个子张量数据周围补上位于相邻的子张量数 据的元素。这种方式避免了修改拆分后的卷积算子或者池化算子本身的计算逻辑,使得对相邻子张量数据的依赖行为对卷积算子、池化算子本身不可见,有利于这个系统的快速实现,便于在不同结构的加速器上一致。但是,补偿算子本身会带来额外的开销,假设原始数据块的大小为M,在不考虑补偿后子张量数据彼此之间重叠部分的情况下,补偿算子引入了2M的访存开销。卷积算子和池化算子是组成神经网络尤其是图像分类神经网络的主要算子,为了减轻补偿行为带来的开销,将插入的补偿算子采用一种金字塔形式的结构来合并。如图8所示,该神经网络模型是由两个卷积算子构成的串行序列,两个卷积算子均按照H/W维度进行任务拆分,分别拆成了4个更小的卷积算子,图中省略了数据的N维度和C维度。假设两个卷积算子的卷积核尺寸分别是k 1、k 2,为了简化计算,位移步长均为1。正常情况下,卷积算子Conv1在计算前在子张量数据外围补偿的数据宽度为k 1/2,这保证拆分后的卷积任务在计算时卷积核不会超过输入子张量数据的边界。但在这里,卷积算子Conv1在计算前在子张量数据外围补偿的数据宽度为k 1/2+k 2/2,这使得其输出数据Tensor1的子张量数据彼此之间有k 2宽度的相互重叠,所以卷积算子Conv2在计算开始前不需要对其输入子张量数据进行数据补偿。
通过这种方法,可以把串行算子序列中使用的多个补偿算子合并成顶端的一个,尽管这使得第一次补偿的访存开销变大,但在补偿宽度远小于子数据块自身的尺寸的情况下,可以有效减少了模型拆分后补偿算子的访存开销。但另一方面,这种方法会带来重复计算,图8中卷积算子Conv1的输出张量数据Tensor1的子张量数据的重叠部分的结果在多个拆分后的卷积算子中都被计算了。此外,对输入的特征图像尺寸较小的卷积网络,由于不再满足补偿宽度远小于子张量数据自身尺寸的条件,需要更加谨慎地评估合并多个补偿算子前后总的访存开销的变化情况。为了解决这个问题,同样把补偿算子的合并加入的拆分方案的搜索空间中。整个遍历过程由正向遍历转变为反向遍历,这两种输入在原理上是相通的,但后者更加适合引入了补偿算子 合并以后的搜索策略。设整个神经网络模型的输出张量数据的不拆分状态为终点状态s end,神经网络模型中任一张量数据的任一状态s都有与之对应的由s开始到s end的拆分路径的权重为l s。在遍历开始前,s end对应的权重为0,其余所有张量数据的所有状态的对应权重为 。依据神经网络模型的拓扑关系反向遍历每个算子,在由输出张量数据的拆分状态枚举可能的输入张量数据的拆分状态的过程中,除了枚举正常情况下子张量数据彼此没有重叠需要引入补偿过程的拆分状态,同样枚举彼此有重叠不需要补偿的输入拆分状态。后者对应的状态路径的权重在计算上考虑了冗余计算的部分所增加的时间。仍然利用公式(5)获得包括补偿算子在内的神经网络模型的目标层的输入张量数据的不拆分状态s root到神经网络模型的输出张量数据的不拆状态s end的目标拆分路径。
对于补偿算子来说,将神经网络模型中插入的多个补偿算子采用金字塔结构进行合并,可以获得一个合并后的补偿算子,也可以获得多个合并后的补偿算子。这种情况下,合并后的补偿算子的个数少于合并之前的补偿算子的个数。
需要说明的是,上述获取目标拆分路径方式类似于viterbi算法,此次仅仅是例举的部分情况,而不是穷举,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,比如:神经网络模型的输入张量数据的拆分状态集合到神经网络模型的输出张量数据的拆分状态集合之间的每一条拆分路径的权重由对应的状态路径的权重之和确定。根据经验设置一阈值,拆分路径的权重小于设定的阈值,就可以作为目标拆分路径对神经网络模型进行拆分。但只要其实现的功能以及达到的技术效果与本申请类似,那么均应当属于本申请的保护范围。
需要强调的是,图2所示的关于算子拆分的技术方案内容适用于图7所示的技术方案,这里不再赘述。
如图9所示,为本公开实施例提供的一种神经网络模型拆分方法流程图之四。将胶水算子和补偿算子均引入算子拆分方案中,此时拆 分方法包括:
步骤a):根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
步骤b):在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据的拆分状态集合中的状态调整成所述张量数据的任一拆分状态;
步骤c):在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
步骤d):根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
步骤e):根据所述状态路径的权重,确定所述目标层的目标拆分路径;
步骤f):利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
在神经网络模型的每个算子和其输入张量数据之间插入胶水算子,在神经网络模型的输出张量数据和产生该输出张量数据的算子之间拆入胶水算子。为神经网络模型中的每个张量数据tensor i初始化其状态集合S i,状态集合中存储状态本身以及由该数据的该拆分状态开始执行到网络最后的输出数据的输出状态s root的最短时间,该数值对用(s,t)表示。整个神经网络模型的输出张量数据对应的状态集合S root中有该数据的不拆分状态以及对应的最短时间(s root,0),其余所有集合为空。对于给定神经网络模型,对神经网络模型中的所有算子按照彼 此之间的依赖关系给出拓扑序λ,该拓扑序需满足:对任意算子A,所有依赖于A的算子必须在拓扑序上排在A之后,而所有A依赖的算子必须在拓扑序上排在A之前。
考虑到补偿算子的插入,反向遍历神经网络模型的每个算子的拆分状态集合。在反向遍历阶段,按照逆λ的顺序逐个遍历神经网络模型中的算子,对m输入n输出的算子A,有输入张量数据u 1,…,u m,输出张量数据v 1,…,v n,图2所示的在神经网络模型中关于算子拆分的技术方案内容适用于图9所示的技术方案,图4所示的在插入胶水算子的神经网络模型中关于胶水算子的技术方案内容适用于图9所示的技术方案,图7所示的在插入补偿算子的神经网络模型中关于补偿算子的技术方案内容适用于图9所示的技术方案,这里均不再赘述。反向遍历的时间复杂度为O(NM 2),其中,N是神经网络模型中算子的个数,M表示所有张量数据的拆分状态集合中最大拆分状态集合中的状态个数。
需要强调的是,图2所示的关于算子拆分技术方案内容适用于图9所示的技术方案,图4所示的基于胶水算子的算子拆分技术方案中关于胶水算子的内容适用于图9所示的技术方案,图7所示的基于补偿算子的算子拆分技术方案中关于补偿算子的内容适用于图9所示的技术方案,这里均不再赘述。
图2、图4、图7和图9所示的技术方案通过把神经网络模型中的目标层的每个算子拆分成更小的子任务,并分配到多个核上并行执行来充分利用多核系统的硬件资源。在图4、图7和图9所示的技术方案中,通过引入胶水算子或补偿算子,保证了神经网络模型在拆分后的计算图仍然可以使用单核上的算子核函数来实现,避免了框架在移植过程底层加速器需要修改或者重新实现大量算子的软件栈工作,使其对不具有良好可编程性的加速器更加友好。框架能够自动生成一套对于给定神经网络和多核加速器的高效拆分方案,在方案的生成过程中,能够根据算子的类型和规模结合底层硬件的计算吞吐速率和访存带宽合理调整算子的拆分方式,在硬件核的计算效率和算子本身的 拆分度之间取得较好的平衡,同时也会兼顾上下文算子之间在拆分上的相互配合,统筹规划多个算子的拆分选择。
本公开提供一种神经网络模型拆分装置,其中,包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
优选地,所述目标拆分路径模块包括:
第一遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有指向当前状态的状态路径以及所述状态路径的起始状态到所述目标层的输入张量数据的起始状态的拆分路径;
第一拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
第一选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,所述目标拆分路径模块包括:
第二遍历单元,用于遍历所述目标层的所有拆分状态集合,对当 前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
第二拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
第二选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,还包括:
第一拆分状态集合优化模块,用于在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
优选地,还包括:
第二拆分状态集合优化模块,用于在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
如图10所示,为本申请实施例提出的一种神经网络模型拆分硬件设备示意图。包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述所述的神经网络模型拆分方法。
本说明书实施方式提供的一种神经网络模型拆分硬件设备,其存储器和处理器实现的具体功能,可以与本说明书中的前述实施方式相对照解释,并能够达到前述实施方式的技术效果,这里便不再赘述。
针对上述技术问题,还提出了一种神经网络模型的拆分方法及相 关产品,除了神经网络模型拆分装置的相关描述,相关的拆分方法及相关产品的描述和上述各实施例描述的一致,此处不再加以赘述。关于神经网络模型拆分装置的相关描述如下。
本公开提供一种神经网络模型拆分装置,所述装置包括:
拆分状态集合确定模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
插入胶水算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
状态路径确定模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径确定模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
优选地,所述目标拆分路径确定模块包括:
第一遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有指向当前状态的状态路径以及所述状态路径的起始状态到所述目标层的输入张量数据的起始状态的拆分路径;
第一拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
第一选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,所述目标拆分路径确定模块包括:
第二遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
第二拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
第二选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,所述插入胶水算子模块包括:
插入单元,用于在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,获取包含所述胶水算子在内的神经网络模型的有向无环图;
状态路径单元,用于根据所述有向无环图遍历所述目标层的所有张量数据对应的拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;
确定目标拆分路径单元,用于根据所述状态路径的权重,确定包含所述胶水算子在内的神经网络模型的目标层的目标拆分路径;
选择单元,用于利用包含所述胶水算子在内的神经网络模型的目标层的目标拆分路径对插入的每个胶水算子进行选择,对不需要插入的胶水算子删除,对需要插入的胶水算子保留。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拼接。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拆分。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拼接,再对经拼接处理后的拆分状态集合中的状态进行拆分。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拆分,再对经拆分处理后的拆分状态集合中的状态进行拼接。
优选地,还包括:
第一拆分状态集合优化模块,用于在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
优选地,还包括:
第二拆分状态集合优化模块,用于在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
针对上述技术问题,还提出了一种神经网络模型的拆分方法及相关产品,除了神经网络模型拆分装置的相关描述,相关的拆分方法及相关产品的描述和上述各实施例描述的一致,此处不再加以赘述。关于神经网络模型拆分装置的相关描述如下。
本公开提供一种神经网络模型拆分装置,所述装置包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
插入补偿算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集 合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
优选地,所述插入补偿算子模块包括:
插入单元,用于在所述目标层中的特定类型算子与关联的输入张量数据的拆分状态集合之间插入补偿算子;其中,所述特定类型算子的特征为:被用于计算该类算子的输出张量数据的元素对应的输入张量数据的元素,同样被用于计算所述输出张量数据的元素的相邻元素。
优选地,所述插入单元插入的补偿算子适用的特定类型算子为卷积算子、池化算子、局部响应归一化算子。
优选地,所述插入补偿算子模块还包括:
合并单元,用于采用金字塔形式的结构来合并所述目标层中的多个补偿算子。
优选地,所述目标拆分路径确定模块包括:
遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路 径的权重确定所述当前状态到所述目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,所述神经网络模型拆分装置还包括:
第一拆分状态集合优化模块,用于在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
优选地,所述神经网络模型拆分装置还包括:
第二拆分状态集合优化模块,用于在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
针对上述技术问题,还提出了一种神经网络模型的拆分方法及相关产品,除了神经网络模型拆分装置的相关描述,相关的拆分方法及相关产品的描述和上述各实施例描述的一致,此处不再加以赘述。关于神经网络模型拆分装置的相关描述如下。
本公开提出一种神经网络模型拆分装置,其中,包括:
拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
插入胶水算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,调整所述算子的张量数据的拆分状态集合中的状态;其中,所述胶水算子用于将所述张量数据按照一拆分方式获得的状态调整成按照任一种拆分方式获得的状态;
插入补偿算子模块,用于在所述目标层的算子与关联的拆分状态集合之间插入补偿算子,调整所述算子的输入张量数据的拆分状态集合中的状态;其中,所述补偿算子用于从所述状态的任一子张量数据的相邻子张量数据中获取目标数据,将所述目标数据与所述子张量数据合并;
状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
优选地,所述插入胶水算子模块包括:
第一插入单元,用于在所述目标层的算子与关联的拆分状态集合之间插入胶水算子,获取包含所述胶水算子在内的神经网络模型的有向无环图;
状态路径单元,用于根据所述有向无环图遍历所述目标层的所有张量数据对应的拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;
第一确定目标拆分路径单元,用于根据所述状态路径的权重,确定包含所述胶水算子在内的神经网络模型的目标层的目标拆分路径;
选择单元,用于利用包含所述胶水算子在内的神经网络模型的目标层的目标拆分路径对插入的每个胶水算子进行选择,对不需要插入的胶水算子删除,对需要插入的胶水算子保留。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拼接。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子 的输入张量数据的拆分状态集合中的状态进行拆分。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拼接,再对经拼接处理后的拆分状态集合中的状态进行拆分。
优选地,所述插入胶水算子模块插入的胶水算子用于将胶水算子的输入张量数据的拆分状态集合中的状态进行拆分,再对经拆分处理后的拆分状态集合中的状态进行拼接。
优选地,所述插入补偿算子模块包括:
第二插入单元,用于在所述目标层中的特定类型算子与关联的输入张量数据的拆分状态集合之间插入补偿算子;其中,所述特定类型算子的特征为:被用于计算该类算子的输出张量数据的元素对应的输入张量数据的元素,同样被用于计算所述输出张量数据的元素的相邻元素。
优选地,所述第二插入单元插入的补偿算子适用的特定类型算子为卷积算子、池化算子、局部响应归一化算子。
优选地,所述插入补偿算子模块还包括:
合并单元,用于采用金字塔形式的结构来合并所述目标层中的多个补偿算子。
优选地,所述目标拆分路径确定模块包括:
遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
第二确定目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述 目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
优选地,还包括:
第一拆分状态集合优化模块,用于在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
优选地,还包括:
第二拆分状态集合优化模块,用于在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
在本实施方式中,所述存储器可以包括用于存储信息的物理装置,通常是将信息数字化后再以利用电、磁或者光学等方法的媒体加以存储。本实施方式所述的存储器又可以包括:利用电能方式存储信息的装置,如RAM、ROM等;利用磁能方式存储信息的装置,如硬盘、软盘、磁带、磁芯存储器、磁泡存储器、U盘;利用光学方式存储信息的装置,如CD或DVD。当然,还有其他方式的存储器,例如量子存储器、石墨烯存储器等等。
在本实施方式中,所述处理器可以按任何适当的方式实现。例如,所述处理器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式等等。
在本实施例中,本申请实施例还提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被执行时实现上述所述的神经网络模型拆分方法。
由上可见,本公开的技术方案可以用较小地开销实现深度学习加速器由单核向多核结构上的扩展,并且能够针对给定的网络和底层加 速器特点给出一种高效的拆分方案。实验结果表明,该方案能有效降低各种网络在多核加速器上的端到端时延。
本领域技术人员也知道,除了以纯计算机可读程序代码方式实现客户端和服务器以外,完全可以通过将方法步骤进行逻辑编程来使得客户端和服务器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种客户端和服务器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施方式或者实施方式的某些部分所述的方法。
本说明书中的各个实施方式均采用递进的方式描述,各个实施方式之间相同相似的部分互相参见即可,每个实施方式重点说明的都是与其他实施方式的不同之处。尤其,针对客户端和服务器的实施方式来说,均可以参照前述方法的实施方式的介绍对照解释。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
虽然通过实施方式描绘了本申请,本领域普通技术人员知道,本 申请有许多变形和变化而不脱离本申请的精神,希望所附的权利要求包括这些变形和变化而不脱离本申请的精神。

Claims (15)

  1. 一种神经网络模型拆分方法,其特征在于,所述方法包括:
    根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
    根据所述神经网络模型的有向无环图遍历所述拆分状态集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
    根据所述状态路径的权重,确定所述目标层的目标拆分路径;
    利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
  2. 如权利要求1所述的方法,其特征在于,确定所述目标层的目标拆分路径的步骤包括:
    遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有指向当前状态的状态路径以及所述状态路径的起始状态到所述目标层的输入张量数据的起始状态的拆分路径;
    根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
    遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
  3. 如权利要求1所述的方法,其特征在于,确定所述目标层的目标拆分路径的步骤包括:
    遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
    根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述 目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
    遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
  4. 如权利要求1所述的方法,其特征在于,所述神经网络模型的目标层的算子拆分后获得的子算子数量为2的整数次幂。
  5. 如权利要求1所述的方法,其特征在于,所述神经网络模型的目标层的算子的输入张量数据的拆分状态集合中的状态根据所述算子的计算逻辑和对应输出张量数据的拆分状态集合中的状态确定。
  6. 如权利要求1所述的方法,其特征在于,所述神经网络模型的目标层的算子的输出张量数据的拆分状态集合中的状态根据所述算子的计算逻辑和对应输入张量数据的拆分状态集合中的状态确定。
  7. 如权利要求1所述的方法,其特征在于,还包括:
    在正向遍历阶段,当所述算子的输出张量数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
  8. 如权利要求1所述的方法,其特征在于,还包括:
    在反向遍历阶段,当所述算子具有至少两个输入张量数据时,所述算子的输入张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
  9. 如权利要求1所述的方法,其特征在于,所述状态路径的权重根据算子的类型和规模、多核处理器硬件参数确定。
  10. 一种神经网络模型拆分装置,其特征在于,包括:
    拆分状态集合模块,用于根据所述神经网络模型中目标层的算子,确定与所述目标层的算子关联的张量数据的拆分状态集合;其中,所述目标层为所述神经网络模型中的至少一层;
    状态路径模块,用于根据所述神经网络模型的有向无环图遍历所述拆分状态 集合,确定相邻拆分状态集合之间的状态路径及状态路径的权重;其中,所述状态路径表示所述算子的拆分方式;所述拆分状态集合中的每个状态表示一个子张量数据集合,所述状态的所有子张量数据的并集结果为所述张量数据;
    目标拆分路径模块,用于根据所述状态路径的权重,确定所述目标层的目标拆分路径;
    拆分模块,用于利用所述目标拆分路径对所述神经网络模型的目标层的算子进行拆分。
  11. 如权利要求10所述的装置,其特征在于,所述目标拆分路径模块包括:
    第一遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有指向当前状态的状态路径以及所述状态路径的起始状态到所述目标层的输入张量数据的起始状态的拆分路径;
    第一拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输入张量数据的起始状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
    第一选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
  12. 如权利要求10所述的装置,其特征在于,所述目标拆分路径模块包括:
    第二遍历单元,用于遍历所述目标层的所有拆分状态集合,对当前拆分状态集合,遍历每一状态,获得所有以当前状态为起点的状态路径以及所述状态路径的结束状态到所述目标层的输出张量数据的终止状态的拆分路径;
    第二拆分路径确定单元,用于根据所述状态路径的权重和所述拆分路径的权重确定所述当前状态到所述目标层的输出张量数据的终止状态的拆分路径;其中,所述拆分路径的权重根据所述拆分路径对应的所有状态路径的权重确定;
    第二选择目标拆分路径单元,用于遍历完所述目标层的所有拆分状态集合后,获得所述目标层的输入张量数据的拆分状态集合与所述目标层的输出张量数据的拆分状态集合之间的目标拆分路径。
  13. 如权利要求10所述的装置,其特征在于,还包括:
    第一拆分状态集合优化模块,用于在正向遍历阶段,当所述算子的输出张量 数据被至少两个算子作为输入张量数据,或者所述算子具有至少两个输出张量数据时,所述算子的输出张量数据的拆分状态集合中保留一个拆分状态,且所述拆分状态经由所述算子的同一状态路径确定。
  14. 一种神经网络模型拆分硬件设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1~9中任一项所述方法的步骤。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1~9中任一项所述的方法的步骤。
PCT/CN2020/084416 2019-02-14 2020-04-13 神经网络模型拆分方法、装置、计算机设备和存储介质 WO2020164644A2 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20756078.0A EP3926546A4 (en) 2019-02-14 2020-04-13 METHOD FOR DIVISION OF NEURONAL NETWORK MODEL, APPARATUS, COMPUTER DEVICE AND INFORMATION HOLDER
US17/419,290 US20220092386A1 (en) 2019-02-14 2020-04-13 Neural network model splitting method, apparatus, computer device and storage medium

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN201910115162.2A CN111563587B (zh) 2019-02-14 2019-02-14 一种神经网络模型的拆分方法及相关产品
CN201910115130.2 2019-02-14
CN201910114927.0 2019-02-14
CN201910115130.2A CN111563586B (zh) 2019-02-14 2019-02-14 一种神经网络模型的拆分方法及相关产品
CN201910114927.0A CN111563584B (zh) 2019-02-14 2019-02-14 一种神经网络模型的拆分方法及相关产品
CN201910115162.2 2019-02-14
CN201910114967.5A CN111563585B (zh) 2019-02-14 2019-02-14 一种神经网络模型的拆分方法及相关产品
CN201910114967.5 2019-02-14

Publications (2)

Publication Number Publication Date
WO2020164644A2 true WO2020164644A2 (zh) 2020-08-20
WO2020164644A3 WO2020164644A3 (zh) 2020-10-01

Family

ID=72045467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084416 WO2020164644A2 (zh) 2019-02-14 2020-04-13 神经网络模型拆分方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
US (1) US20220092386A1 (zh)
EP (1) EP3926546A4 (zh)
WO (1) WO2020164644A2 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022128406A1 (en) * 2020-12-16 2022-06-23 Xmos Inc. Artificial neural network implementation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748599B2 (en) * 2019-02-21 2023-09-05 Texas Instruments Incorporated Super-tiling in neural network processing to enable analytics at lower memory speed
CN116991560B (zh) * 2023-09-25 2024-04-16 粤港澳大湾区数字经济研究院(福田) 针对语言模型的并行调度方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05216857A (ja) * 1992-02-07 1993-08-27 Hitachi Ltd ニューラルネットワーク学習状態表示方法
CN107766894B (zh) * 2017-11-03 2021-01-22 吉林大学 基于注意力机制和深度学习的遥感图像自然语言生成方法
CN108288091B (zh) * 2018-01-19 2020-09-11 上海兆芯集成电路有限公司 采布斯乘法的微处理器
CN108965585B (zh) * 2018-06-22 2021-01-26 成都博宇科技有限公司 一种基于智能手机传感器的用户身份识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022128406A1 (en) * 2020-12-16 2022-06-23 Xmos Inc. Artificial neural network implementation

Also Published As

Publication number Publication date
EP3926546A2 (en) 2021-12-22
WO2020164644A3 (zh) 2020-10-01
US20220092386A1 (en) 2022-03-24
EP3926546A4 (en) 2022-04-27

Similar Documents

Publication Publication Date Title
CN111242321B (zh) 一种数据处理方法及相关产品
US20220391678A1 (en) Neural network model processing method and apparatus, computer device, and storage medium
US20220391665A1 (en) Method for splitting neural network model by using multi-core processor, and related product
WO2020164644A2 (zh) 神经网络模型拆分方法、装置、计算机设备和存储介质
CN109993299B (zh) 数据训练方法及装置、存储介质、电子装置
CN113703775B (zh) 一种编译方法、装置、设备及存储介质
JP7287397B2 (ja) 情報処理方法、情報処理装置及び情報処理プログラム
CN110689121A (zh) 一种用多核处理器实现神经网络模型拆分方法及相关产品
CN111563584B (zh) 一种神经网络模型的拆分方法及相关产品
CN111563587B (zh) 一种神经网络模型的拆分方法及相关产品
CN114008594A (zh) 调度计算图上的操作
CN114915630A (zh) 基于物联网设备的任务分配方法、网络训练方法及装置
Tanaka et al. Automatic graph partitioning for very large-scale deep learning
CN112990461B (zh) 构建神经网络模型的方法、装置、计算机设备和存储介质
WO2023160290A1 (zh) 神经网络推理加速方法、目标检测方法、设备及存储介质
CN111563585B (zh) 一种神经网络模型的拆分方法及相关产品
CN111563586B (zh) 一种神经网络模型的拆分方法及相关产品
CN115809688B (zh) 一种模型调试方法、装置、电子设备及存储介质
WO2022127603A1 (zh) 一种模型处理方法及相关装置
WO2022252694A1 (zh) 神经网络优化方法及其装置
CN114021733B (zh) 模型训练优化方法、装置、计算机设备及存储介质
KR102424538B1 (ko) 영상 복원 방법 및 장치
CN114254764B (zh) 基于反馈的机器学习模型搜索方法、系统、设备及介质
CN116798103B (zh) 基于人工智能的人脸图像处理方法及系统
KR102508886B1 (ko) 신경망 모델 경량화 방법, 및 신경망 모델 경량화 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20756078

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020756078

Country of ref document: EP

Effective date: 20210914