CN111562977B

CN111562977B - Neural network model splitting method, device, storage medium and computer system

Info

Publication number: CN111562977B
Application number: CN201910114831.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2022-12-09
Anticipated expiration: 2039-02-14
Also published as: CN111562977A

Abstract

The application relates to a neural network model splitting method, a device, computer equipment, a storage medium and a computer system. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.

Description

Neural network model splitting method, device, storage medium and computer system

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network model splitting method, apparatus, computer device, storage medium, and computer system.

Background

In recent years, deep learning accelerators have been proposed and are expanding from single core to multi-core as general purpose processors. The expanded multi-core structure can support a data parallel mode in a training stage to improve data throughput and accelerate training speed. However, in the inference phase, there is a higher requirement for end-to-end delay than for the throughput deep neural network, which often determines the availability of the accelerator in a certain scenario. The traditional data parallel scheme cannot meet the requirements on small data and low delay of an accelerator in an inference scene.

Disclosure of Invention

In view of the foregoing, there is a need to provide a neural network model splitting method, apparatus, computer device, storage medium and computer system that can efficiently resolve the above technical problems.

A neural network model splitting method, the method comprising:

inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model;

iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges;

then, the recurrent neural network model outputs a target splitting strategy;

and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.

As an optional implementation, updating parameters of the recurrent neural network model according to the current splitting policy includes: according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.

As an optional implementation, calculating an execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme includes: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.

As an optional embodiment, splitting the neural network model according to the target splitting scheme includes: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub tensor data.

As an optional implementation, the segmenting each tensor data according to its splitting dimension and splitting number includes: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.

As an optional implementation manner, the splitting dimension includes one or more of a batch size, a number of feature images, a height of a feature image, and a width of a feature image.

As an optional implementation, before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further includes: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

As an optional implementation, before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further includes: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.

As an optional implementation manner, the inserting a compensation operator before the input tensor data of the ith operator in the neural network model to obtain the neural network model includes: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator in front of the input tensor data of the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

As an optional implementation, before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further includes: according to the type of an operator in the neural network model to be processed, fusing a plurality of operators in the neural network model to be processed to obtain a fusion operator, and replacing the plurality of operators with the fusion operator to obtain the neural network model.

A neural network model splitting apparatus, comprising:

the splitting decision module is used for inputting tensor data in a neural network model and associated information of the tensor data into a circulating neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;

and the splitting execution module is used for obtaining a target splitting scheme according to the target splitting strategy and splitting the neural network model according to the target splitting scheme.

A computer device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor when executing the computer program performing the steps of the method.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

A computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on a processor, the first processor when executing the computer program being operable to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model, and iteratively executing the steps of obtaining a current splitting strategy by using the cyclic neural network model and updating parameters of the cyclic neural network model according to the current splitting strategy until the neural network of the cyclic neural network model converges; then, a target splitting strategy output by the recurrent neural network model obtains a target splitting scheme according to the splitting strategy; splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.

In an optional implementation, the second processor comprises a multi-core accelerator; dividing each tensor data by using the first processor according to the splitting dimension and the splitting number of each tensor data to obtain sub-tensor data; and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.

According to the neural network model splitting method and device, the computer equipment, the storage medium and the computer system, the target splitting strategy is obtained through the cyclic neural network based on the attention mechanism, and sampling is carried out according to probability distribution of different splitting schemes provided by the target splitting strategy to obtain a specific target splitting scheme. The target splitting scheme is a splitting scheme for the whole neural network model obtained by comprehensively considering the information of all tensor data in the neural network model to be processed and the relationship between the tensor data, and then the neural network model is split according to the splitting scheme, so that the neural network model is processed in parallel. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.

Drawings

FIG. 1 is a block diagram of a computer system that is presented in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a neural network model splitting method in one embodiment;

FIG. 3 is a schematic flow diagram illustrating the processing of a particular neural network by the recurrent neural network model in one embodiment;

FIG. 4 is a schematic flow chart of a step of refining step S202 in one embodiment;

FIG. 5 is a flow chart illustrating a step of refining step S203 in one embodiment;

FIG. 6 is a diagram illustrating the effect of a compensation operator on tensor data in one embodiment;

FIG. 7 is a block diagram of a parallel processing apparatus for a neural network in one embodiment;

FIG. 8 is a block diagram showing the structure of a parallel processing apparatus of a neural network according to another embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As shown in fig. 1, a proposed computer system 100 according to an embodiment of the present invention may include a first processor 110 and a second processor 120, and optionally, the second processor may be a multi-core accelerator, and each accelerator core of the multi-core accelerator may be connected to a first memory. The computer System 100 may be a Multi-processor computer System (Multi-processor Computing System) including a plurality of processors, such as a Multi-core processor computer System (Multi-core processor Computing System) or a Heterogeneous computer System (Heterogeneous Computing System).

As shown in fig. 2, for the neural network model splitting method proposed in one embodiment of the present application, the method may be executed by the computer system 100 shown in fig. 1, and the method includes:

step S201, inputting tensor data in the neural network model and associated information of the tensor data into a recurrent neural network model.

The neural network model can be generally regarded as a directed acyclic graph formed by operators and tensor data, wherein the operators and the data are connected with each other through directed edges, and the direction of the directed edges indicates that the data are input or output of the operators. The association information of the tensor data may include information of an operator associated with the tensor data, such as information of an operator type, an operator number, and the like.

The first processor abstracts each tensor data in the directed acyclic graph into a corresponding eigenvector, wherein the eigenvector comprises the shape and size of the data, the types of the operators before and after the tensor data and other related information, and the eigenvector can be used as input tensor data of the cyclic neural network. Specifically, the first processor 110 inputs tensor data in the neural network and associated information of the tensor data into the recurrent neural network model.

Step S202, the step of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy is executed in an iterative manner, if the recurrent neural network does not converge, the step S201 is returned, and if the recurrent neural network model converges, the step S203 is executed continuously. Step S203, the recurrent neural network model outputs a target splitting strategy.

In this step, the current splitting strategy output by the recurrent neural network model is the probability distribution of splitting each tensor data in the neural network into different segment numbers in each dimension. In other words, the current split strategy can select the probability magnitudes of various split segment numbers in various dimensions for various tensor data. The splitting scheme consists of one splitting state of each tensor data in the neural network model. For example, in the cyclic neural network model shown in fig. 3, n operators are included, and the n operators are a serial neural network model which can be described as an operator sequence (op) ₁ ,op ₂ ,...,op _n ) And all tensor data constitute a set { tensor', including input tensor data and output tensor data of the entire neural network model and intermediate result tensor data between all operators ₀ ,tensor ₁ ,...,tensor _n }，op _i Is the input of tenor _i-1 The output is tensor _i . To pairEach tensor data tensor _i Having a state set S corresponding thereto ⁱ The objective task of the recurrent neural network is to find a mapping relationship tensor between the tensor data itself and a state of its state set _i →s ⁱ The splitting mode of all tensor data can be determined by determining a specific state for each tensor in the circular neural network model, and therefore, a mapping relation of all tensor data in a neural network model to the splitting state is called a splitting scheme P of the neural network model.

Specifically, after tensor data in a neural network model and associated information of the tensor data are input into a cyclic neural network model, a first processor runs a correlation algorithm of the cyclic neural network model to process the input tensor data and the associated information, outputs a current splitting strategy, and updates parameters of the cyclic neural network model according to the current splitting strategy.

Specifically, the first processor 110 samples probability distributions of different splitting states of tensor data in the current splitting strategy, so as to obtain a splitting scheme. And then, updating the parameters of the recurrent neural network model according to the splitting scheme. Further, a splitting scheme for the neural network is obtained by sampling the current splitting strategy. And splitting the network according to the scheme and executing the split network on the multi-core accelerator in parallel to obtain the execution time of the scheme. And updating the parameters in the recurrent neural network by using the execution time of all the splitting schemes obtained by the current splitting strategy.

Further, the input of the recurrent neural network model is a feature vector set containing all tensor data and relevant information thereof, and the output is a splitting strategy of each tensor data. The current splitting strategy output by the recurrent neural network model for the first time is a randomly generated splitting strategy, then the model is updated in each round according to the time of executing the splitting strategy generated in the previous round on the second processor, the updated model is used for entering the next round, and the whole process consists of one round of 'updating-testing' process until a better splitting strategy aiming at the current model is obtained. After the current splitting strategy is obtained, sampling is carried out for n (n is a positive integer, for example 100) times according to the current strategy, each sampling can obtain a set of complete splitting scheme of the whole network containing the splitting modes of all tensor data in the neural network, and parameters in the cyclic neural network are updated according to the execution time of the splitting scheme. Optionally, when the parameter of two adjacent updates is smaller than a preset threshold, the neural network of the recurrent neural network model is determined to be converged. And then, obtaining a target splitting scheme by the splitting strategy output by the recurrent neural network model.

Specifically, after a splitting scheme is obtained by sampling probability distribution of different splitting states of each tensor data, the number of splitting segments of each tensor data in each dimension in the neural network is determined according to the splitting scheme, and the positions of the starting point and the end point of each splitting segment of each tensor data in each dimension are deduced one by one from the front direction and the back direction by combining the structural information of the neural network model. And obtaining a specific splitting scheme of the neural network model.

And step S204, obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.

Specifically, the first processor 110 obtains a target splitting scheme according to the target splitting policy, and splits the neural network model according to the target splitting scheme. More specifically, a target splitting strategy is sampled to obtain a target splitting scheme, and the neural network model is split according to the target splitting scheme. And deducing the positions of the starting point and the ending point of each split section of each tensor data in each dimension one by one from the front to the back by combining the structural information of the neural network model according to the number of the split sections of each tensor data in each dimension in the neural network model determined by a target splitting scheme. And obtaining a specific target splitting scheme of the neural network model.

In the neural network model splitting method in this embodiment, a target splitting strategy is obtained through a recurrent neural network based on an attention mechanism, and a specific target splitting scheme is obtained by sampling according to probability distributions of different splitting schemes provided by the target splitting strategy. The target splitting scheme is a splitting scheme for the whole neural network model obtained by comprehensively considering the information of all tensor data in the neural network model to be processed and the relationship among the tensor data, and then the neural network model is split according to the splitting scheme, so that the neural network model is processed in parallel. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.

In another embodiment, as shown in fig. 4, step S202 includes:

s2021, according to probability distributions of different splitting states of each tensor data in the current splitting policy, sampling the different splitting states of the tensor data in the neural network model n times to obtain n splitting schemes; wherein n is a positive integer.

Specifically, the first processor 110 samples the probability distribution of the different splitting states of the tensor data in the neural network model for n times according to the probability distribution of the different splitting states of each tensor data in the current splitting strategy, so as to obtain n splitting schemes.

S2022, calculating an execution time of each of the n splitting schemes.

Specifically, the first processor 110 calculates the execution time of each of the n splitting schemes.

S2023, updating the parameters of the recurrent neural network model according to the execution time of each splitting scheme.

Specifically, the first processor 110 updates the parameters of the recurrent neural network model according to the execution time of each splitting scheme.

In one embodiment, S2022, comprises: determining the calculation load and the memory access data amount of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.

Specifically, step S203 may include: the first processor divides each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.

In one embodiment, as shown in fig. 5, step S203 includes:

s2031, determining a splitting start point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data.

Specifically, after the number of split segments in each dimension of each tensor data in the neural network is obtained by sampling the current splitting strategy, the starting point and the end point of each segment of the input tensor data of the neural network in each dimension are determined according to the principle of equal division as much as possible, so that the splitting mode of the input tensor data is obtained.

S2032, segmenting each tensor data according to the splitting start point and the splitting end point.

Further, traversing all operators according to the structure of the neural network model, and after the splitting mode of the input tensor data of the operator is determined, determining the starting point and the end point of each section of the output tensor data in each dimension by using the operator, the splitting mode of the input tensor data of the operator and the number of the splitting sections in each dimension of the output tensor data of the operator, so as to obtain the splitting mode of the output data. And after the traversal is completed, all tensor data in the neural network have a determined splitting mode.

The splitting method in this embodiment can change tensor data from a current splitting state to a target splitting state.

It should be noted that, if the sub-tensor data obtained after splitting needs to be processed by an accelerator of the processing neural network, the accelerator usually needs data of a certain scale to ensure that higher calculation efficiency is obtained. In order to calculate throughput, the accelerator adopts a large number of long-bit-width vector calculation units, and if the dimension is too small, the bit width of vector calculation cannot be filled, so that a large number of calculation units are in an unused state, and the calculation efficiency on an accelerator core is greatly influenced. At this time, it is necessary to ensure that the tensor data cannot be over-split in a certain dimension.

When the tensor data in the neural network are split and the neural network is processed in parallel, if the split tensor data are too small, or some operators (for example, a Local Response Normalization operator, LRN for short) need to use data adjacent to the sub-tensor data during operation, at this time, the compensation operator can be used to compensate the input tensor data of the corresponding operator, so that the corresponding operator can normally acquire the input tensor data and perform corresponding operations. The compensation operator can perform a compensation operation on the data, and specifically, the compensation operator can compensate the input tensor data by inserting the compensation operator between a certain operator and the input tensor data. Alternatively, if the input tensor data of a certain operator is split, the compensation operator may also compensate the corresponding sub-input tensor data (obtained by splitting the input tensor data).

For example, convolution operators in neural network models, in some cases require additional auxiliary operators to complete the splitting task. When dividing the calculation according to the H/W dimension of the input tensor data, if the size of the convolution kernel window exceeds the step length of each movement of the convolution kernel window, namely, the kernel > stride, the condition that a frame moves to the boundary and exceeds the boundary of a sub data block occurs in the calculation of the divided convolution operator, and the missing part of data is located on an adjacent sub data block. As shown in fig. 6, the compensation operator can be used to read adjacent data except for one sub-tensor data from the storage location of other sub-tensor data, and combine the adjacent data with the original data to form a larger data block, where the moving range of the window in the calculation stage does not exceed the boundary of the compensated data block.

Besides the convolution operator, the pooling operator and the Local Response Normalization operator (LRN) which is not very common at present also have the problem that the split subtask depends on data on adjacent data blocks, the pooling operator is similar to the convolution operator, mainly caused by the fact that a pooling window is larger than the moving step length of the window, but the Local Response Normalization operator is different, and the calculation logic is that, in order to calculate the result of one point of output data on the C dimension, the value of one point corresponding to tensor data on the C dimension and the value of k/2 points adjacent to the left and right points need to be input. Thus, if the calculation of the local response normalization operator is split into multiple LRN operators according to the C-dimension, each new operator also requires data from adjacent data blocks to calculate values located on the C-dimension boundaries.

Based on the characteristic that the compensation operator can compensate the input tensor data or the sub-input tensor data obtained by splitting the input tensor data, in one embodiment of the application, a scheme for optimizing the neural network model splitting method by using the compensation operator is provided. The scheme is as follows:

before step S201, a compensation operator is inserted between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data. Optionally, the type of the ith operator includes: one of a convolution operator, a local response normalization operator, and a pooling operator.

In one embodiment, the compensation operator compensates the input tensor data of the corresponding operator, including: compensating, with the compensation operator, each sub-input tensor data in the original input tensor data of the ith operator according to adjacent sub-input tensor data of each sub-input tensor data in the original input tensor data of the ith operator. Further, a compensation operator determines a compensation parameter according to the type and scale of the ith operator, wherein the compensation parameter is used for determining the position of compensation data on adjacent tensor data of the sub-tensor data; reading required compensation data of each sub tensor data from adjacent sub input tensor data of each sub input tensor data in the original input tensor data of the ith operator according to the compensation operator and the compensation parameter; and combining the compensation data of each sub tensor data and the compensation data thereof.

Further, if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

Operator Fusion (Kernel Fusion) is a commonly used technique in current neural network model optimization. The traditional neural network model optimization on the GPU or the CPU is usually aimed at the internal calculation logic of each operator, and the execution performance of a single operator is improved by increasing the vectorization and improving the utilization rate of on-chip storage. The optimization is limited by an optimization space provided by an operator, and certain limitation exists, while the operator fusion is realized by fusing a plurality of operators in the neural network into a new operator, so that the gap between the operators is broken through, on one hand, the cost for starting each operator (Kernel Launch) can be reduced, and on the other hand, the access and storage cost of intermediate results among some unnecessary operators can be saved. The most common operator fusion is the case of combining convolution, pooling or other common operators with several successive pair-wise computation operators behind it into a new operator. The contraposition calculation operator generally refers to a common operator in a neural network, and is characterized in that input tensor data and output data have the same shape, and a value in the output data is calculated only by inputting a value in a position corresponding to the input tensor data, and the operator comprises Relu and other activating operators, scale, batchNorm and the like. By fusing the alignment calculation operator with any operator in front of the alignment calculation operator, a large amount of memory access overhead of intermediate results can be avoided. In the calculation process of the non-alignment calculation operator at the forefront of the sequence, when the numerical value of one point is calculated, the subsequent alignment calculation can be immediately carried out on the numerical value to obtain the final result of the whole operator sequence, but the current output is not stored into the storage, and the subsequent alignment calculation operator reads out the final result from the storage again for calculation.

In order to reduce the number of tensor data in the neural network model and thus accelerate the training decision speed of each network, the neural network model to be processed is preprocessed before the tensor data in the neural network model and the associated information of the tensor data are input into the cyclic neural network model. In the preprocessing process, the counterpoint calculation operator is merged into the previous operator, and correspondingly, intermediate results among the operators are removed from the calculation graph, because the data become temporary data inside the operators at the moment, the number of data blocks in the whole network is effectively reduced, and the space to be decided is reduced. On the other hand, the method ensures that in the finally obtained splitting scheme, the splitting mode of the alignment calculation operators is always consistent with the input tensor data, which means that in the split calculation graph, the sub-alignment calculation operators can also be optimized by using an operator fusion mode.

The specific implementation method of the fusion operator scheme comprises the following steps: before step S201, fusing the operator type in the neural network model to be processed with the plurality of operators in the neural network model to obtain a fusion operator, and replacing the plurality of operators with the fusion operator to obtain the neural network model.

The computational scale of each layer of the neural network model continuously changes along with the extension of the network. For a general classification network, convolution of the first few layers is usually large in dimension of the size H/W of the feature image, and is small in dimension of the number of feature images, i.e. C, and considering the calculation scale necessary for ensuring the efficiency of the accelerator, the splitting strategies of the first few layers should be inclined to splitting in the feature image, i.e. H/W dimension. As the calculation progresses layer by layer, the size of the feature image gradually decreases, and the number of the feature images continuously increases, and the splitting of the convolution operator should be inclined to the C dimension. Along with the change of the splitting trend of the neural network model, the splitting mode of the operator needs to be adjusted correspondingly, namely the state of an intermediate result is adjusted, and how to properly insert the glue operator in the whole network to improve the network splitting performance will be described next.

The specific implementation in this scheme may be: before step S201, a glue operator is inserted between each operator in the neural network model to be processed and the input tensor data of the operator, so as to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

It should be noted that the neural network model is obtained by adjusting the neural network to be processed through the compensation operator, the operator fusion and the glue operator, the fusion operator and the compensation operator in the neural network model are all processed the same as the common neural network model operator.

It should be understood that although the steps in the flowcharts of fig. 2, 4-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided a neural network model splitting apparatus, including:

the splitting decision module 210 inputs tensor data in the neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;

and the splitting executing module 220 obtains a target splitting scheme according to the target splitting strategy, and splits the neural network model according to the target splitting scheme.

In one embodiment, the splitting decision module 210 is configured to sample, for n times, probability distributions of different splitting states of tensor data in the neural network model according to probability distributions of different splitting states of each tensor data in the current splitting policy, so as to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating the parameters of the recurrent neural network model according to the execution time of each splitting scheme.

In one embodiment, the splitting decision module 210 is configured to determine a calculation load and an access data amount of an operator corresponding to each tensor data according to types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the access data volume, the access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.

In one embodiment, the splitting executing module 220 is configured to segment each tensor data according to the splitting dimension and the splitting number of each tensor data in the target splitting scheme, so as to obtain sub-tensor data.

In one embodiment, the splitting executing module 220 is configured to determine a splitting start point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.

Specifically, when tensor data is split, a current state of the tensor data, an associated operator and a target splitting state are determined, a splitting starting point and a splitting end point of the tensor data on each splitting dimension are determined, the current state comprises the splitting number of the tensor data on each splitting dimension, and the target splitting state comprises the splitting number of the tensor data on each splitting dimension.

In one embodiment, as shown in fig. 8, the neural network model splitting apparatus further includes: the neural network optimization module 230 is configured to insert a glue operator between each operator in the to-be-processed neural network model and the input tensor data of the operator to obtain a neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

In one embodiment, the neural network optimization module 230 is further configured to insert a compensation operator between an ith operator of the neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.

In one embodiment, the neural network optimization module 230 is specifically configured to, if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determine a compensation parameter of a combined compensation operator according to a preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

In one embodiment, the neural network optimization module 230 is further configured to fuse multiple operators in the neural network model according to the types of the operators in the neural network model to be processed to obtain a fusion operator, and replace the multiple operators with the fusion operator to obtain the neural network model.

For specific definition of the neural network model splitting device, reference may be made to the definition of the neural network model splitting method above, and details are not described here. All or part of each module in the neural network model splitting device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device comprises a processor, a memory and a network interface which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a neural network model splitting method.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor implementing the following steps when executing the computer program: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.

In one embodiment, the processor when executing the computer program implements the steps of: according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.

In one embodiment, the processor when executing the computer program implements the steps of: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.

In one embodiment, the processor, when executing the computer program, performs the steps of: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.

In one embodiment, the processor when executing the computer program further performs the steps of: before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further includes: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

In one embodiment, the processor when executing the computer program further performs the steps of: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

In one embodiment, the processor when executing the computer program implements the steps of: fusing the operators in the neural network model according to the type of the operator in the neural network model to be processed to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.

In one embodiment, the computer program when executed by the processor further performs the steps of: according to the probability distribution of different splitting states of each tensor data in the current splitting strategy, sampling the probability distribution of the different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.

In one embodiment, the computer program when executed by the processor further performs the steps of: determining the calculation load and the memory access data amount of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.

In one embodiment, the computer program when executed by the processor further performs the steps of: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.

In one embodiment, the computer program when executed by the processor further performs the steps of: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.

In one embodiment, the computer program when executed by the processor further performs the steps of: before inputting tensor data in the neural network and associated information of the tensor data into a recurrent neural network model, the method further comprises: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

In one embodiment, the computer program when executed by the processor further performs the steps of: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.

In one embodiment, the computer program when executed by the processor further performs the steps of: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

In one embodiment, the computer program when executed by the processor further performs the steps of: according to the type of an operator in a neural network model to be processed, fusing the operators in the neural network to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.

There is also provided, in one embodiment, a computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on the processor, the first processor, when executing the computer program, being configured to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model, and obtaining a current splitting strategy by using the cyclic neural network model; iteratively executing the step of updating the parameters of the recurrent neural network model using a current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting scheme; splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.

In an optional implementation, the second processor comprises a multi-core accelerator; the multi-core accelerator divides each tensor data according to the splitting dimension and the splitting number of each tensor data by using the first processor to obtain sub-tensor data; and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the plurality of accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.

Further, the first processor is used for splitting all tensor data in the network according to the splitting dimensionality and the splitting quantity of each tensor data and by combining a network structure, and a sub-tensor data set of each tensor data is obtained. The splitting of the operators associated with the tensor data is realized while the tensor data is split, and the sub-operator set of each operator is obtained. And then distributing the sub-operators in the sub-operator set of each operator to a plurality of acceleration cores of the multi-core accelerator, reading sub-tensor data corresponding to the sub-operators from the storage by the accelerator core as input, and executing the sub-operators.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A neural network model splitting method is characterized by comprising the following steps:

tensor data in a neural network model and associated information of the tensor data are input into a circular neural network model;

iteratively executing the cyclic neural network model to obtain a current splitting strategy, and according to the probability distribution of different splitting states of each tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer;

calculating an execution time of each of the n splitting schemes; calculating the execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme, wherein the calculating the execution time of the splitting scheme comprises the following steps: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the memory access data amount;

updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges;

then, the recurrent neural network model outputs a target splitting strategy;

2. The method according to claim 1, wherein splitting the neural network model according to the target splitting scheme comprises:

and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.

3. The method of claim 2, wherein the segmenting each tensor data according to the splitting dimension and the splitting number of each tensor data in the target splitting scheme comprises:

determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data;

and segmenting each tensor data according to the splitting starting point and the splitting end point.

4. The method of claim 3, wherein the split dimension comprises one or more of a batch size, a number of feature images, a height of a feature image, a width of a feature image.

5. The method of claim 1, wherein before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further comprises:

inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.

6. The method of any one of claims 1-5, wherein before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further comprises:

inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.

7. The method of claim 6, wherein inserting a compensation operator between an ith operator of the neural network model to be processed and input tensor data of the ith operator to obtain the neural network model comprises:

if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator;

and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.

8. The method according to any one of claims 1-5, wherein before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further comprises:

according to the type of an operator in the neural network model to be processed, fusing the operators in the neural network model to be processed to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.

9. A neural network model splitting device, comprising:

the splitting decision module is used for inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the cyclic neural network model to obtain a current splitting strategy, and according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer;

determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the memory access data amount;

updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;

10. A computer arrangement comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.

12. A computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on the processor, wherein the first processor, when executing the computer program, is operable to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing to obtain a current splitting strategy by using the cyclic neural network model, and sampling the probability distribution of different splitting states of tensor data in the neural network for n times according to the probability distribution of different splitting states of each tensor data in the current splitting strategy to obtain n splitting schemes; wherein n is a positive integer;

calculating an execution time of each of the n splitting schemes; calculating the execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme, wherein the calculating the execution time of the splitting scheme comprises the following steps: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the access data volume;

updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.

13. The computer system of claim 12, wherein the second processor comprises a multi-core accelerator;

dividing each tensor data by using the first processor according to the splitting dimension and the splitting number of each tensor data to obtain sub-tensor data;

and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the plurality of accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.