CN111562977B - Neural network model splitting method, device, storage medium and computer system - Google Patents

Neural network model splitting method, device, storage medium and computer system Download PDF

Info

Publication number
CN111562977B
CN111562977B CN201910114831.4A CN201910114831A CN111562977B CN 111562977 B CN111562977 B CN 111562977B CN 201910114831 A CN201910114831 A CN 201910114831A CN 111562977 B CN111562977 B CN 111562977B
Authority
CN
China
Prior art keywords
splitting
neural network
network model
tensor data
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910114831.4A
Other languages
Chinese (zh)
Other versions
CN111562977A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910114831.4A priority Critical patent/CN111562977B/en
Publication of CN111562977A publication Critical patent/CN111562977A/en
Application granted granted Critical
Publication of CN111562977B publication Critical patent/CN111562977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a neural network model splitting method, a device, computer equipment, a storage medium and a computer system. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.

Description

Neural network model splitting method, device, storage medium and computer system
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network model splitting method, apparatus, computer device, storage medium, and computer system.
Background
In recent years, deep learning accelerators have been proposed and are expanding from single core to multi-core as general purpose processors. The expanded multi-core structure can support a data parallel mode in a training stage to improve data throughput and accelerate training speed. However, in the inference phase, there is a higher requirement for end-to-end delay than for the throughput deep neural network, which often determines the availability of the accelerator in a certain scenario. The traditional data parallel scheme cannot meet the requirements on small data and low delay of an accelerator in an inference scene.
Disclosure of Invention
In view of the foregoing, there is a need to provide a neural network model splitting method, apparatus, computer device, storage medium and computer system that can efficiently resolve the above technical problems.
A neural network model splitting method, the method comprising:
inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model;
iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges;
then, the recurrent neural network model outputs a target splitting strategy;
and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.
As an optional implementation, updating parameters of the recurrent neural network model according to the current splitting policy includes: according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.
As an optional implementation, calculating an execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme includes: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.
As an optional embodiment, splitting the neural network model according to the target splitting scheme includes: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub tensor data.
As an optional implementation, the segmenting each tensor data according to its splitting dimension and splitting number includes: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.
As an optional implementation manner, the splitting dimension includes one or more of a batch size, a number of feature images, a height of a feature image, and a width of a feature image.
As an optional implementation, before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further includes: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
As an optional implementation, before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further includes: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.
As an optional implementation manner, the inserting a compensation operator before the input tensor data of the ith operator in the neural network model to obtain the neural network model includes: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator in front of the input tensor data of the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
As an optional implementation, before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further includes: according to the type of an operator in the neural network model to be processed, fusing a plurality of operators in the neural network model to be processed to obtain a fusion operator, and replacing the plurality of operators with the fusion operator to obtain the neural network model.
A neural network model splitting apparatus, comprising:
the splitting decision module is used for inputting tensor data in a neural network model and associated information of the tensor data into a circulating neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;
and the splitting execution module is used for obtaining a target splitting scheme according to the target splitting strategy and splitting the neural network model according to the target splitting scheme.
A computer device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor when executing the computer program performing the steps of the method.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
A computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on a processor, the first processor when executing the computer program being operable to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model, and iteratively executing the steps of obtaining a current splitting strategy by using the cyclic neural network model and updating parameters of the cyclic neural network model according to the current splitting strategy until the neural network of the cyclic neural network model converges; then, a target splitting strategy output by the recurrent neural network model obtains a target splitting scheme according to the splitting strategy; splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.
In an optional implementation, the second processor comprises a multi-core accelerator; dividing each tensor data by using the first processor according to the splitting dimension and the splitting number of each tensor data to obtain sub-tensor data; and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.
According to the neural network model splitting method and device, the computer equipment, the storage medium and the computer system, the target splitting strategy is obtained through the cyclic neural network based on the attention mechanism, and sampling is carried out according to probability distribution of different splitting schemes provided by the target splitting strategy to obtain a specific target splitting scheme. The target splitting scheme is a splitting scheme for the whole neural network model obtained by comprehensively considering the information of all tensor data in the neural network model to be processed and the relationship between the tensor data, and then the neural network model is split according to the splitting scheme, so that the neural network model is processed in parallel. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.
Drawings
FIG. 1 is a block diagram of a computer system that is presented in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a neural network model splitting method in one embodiment;
FIG. 3 is a schematic flow diagram illustrating the processing of a particular neural network by the recurrent neural network model in one embodiment;
FIG. 4 is a schematic flow chart of a step of refining step S202 in one embodiment;
FIG. 5 is a flow chart illustrating a step of refining step S203 in one embodiment;
FIG. 6 is a diagram illustrating the effect of a compensation operator on tensor data in one embodiment;
FIG. 7 is a block diagram of a parallel processing apparatus for a neural network in one embodiment;
FIG. 8 is a block diagram showing the structure of a parallel processing apparatus of a neural network according to another embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in fig. 1, a proposed computer system 100 according to an embodiment of the present invention may include a first processor 110 and a second processor 120, and optionally, the second processor may be a multi-core accelerator, and each accelerator core of the multi-core accelerator may be connected to a first memory. The computer System 100 may be a Multi-processor computer System (Multi-processor Computing System) including a plurality of processors, such as a Multi-core processor computer System (Multi-core processor Computing System) or a Heterogeneous computer System (Heterogeneous Computing System).
As shown in fig. 2, for the neural network model splitting method proposed in one embodiment of the present application, the method may be executed by the computer system 100 shown in fig. 1, and the method includes:
step S201, inputting tensor data in the neural network model and associated information of the tensor data into a recurrent neural network model.
The neural network model can be generally regarded as a directed acyclic graph formed by operators and tensor data, wherein the operators and the data are connected with each other through directed edges, and the direction of the directed edges indicates that the data are input or output of the operators. The association information of the tensor data may include information of an operator associated with the tensor data, such as information of an operator type, an operator number, and the like.
The first processor abstracts each tensor data in the directed acyclic graph into a corresponding eigenvector, wherein the eigenvector comprises the shape and size of the data, the types of the operators before and after the tensor data and other related information, and the eigenvector can be used as input tensor data of the cyclic neural network. Specifically, the first processor 110 inputs tensor data in the neural network and associated information of the tensor data into the recurrent neural network model.
Step S202, the step of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy is executed in an iterative manner, if the recurrent neural network does not converge, the step S201 is returned, and if the recurrent neural network model converges, the step S203 is executed continuously. Step S203, the recurrent neural network model outputs a target splitting strategy.
In this step, the current splitting strategy output by the recurrent neural network model is the probability distribution of splitting each tensor data in the neural network into different segment numbers in each dimension. In other words, the current split strategy can select the probability magnitudes of various split segment numbers in various dimensions for various tensor data. The splitting scheme consists of one splitting state of each tensor data in the neural network model. For example, in the cyclic neural network model shown in fig. 3, n operators are included, and the n operators are a serial neural network model which can be described as an operator sequence (op) 1 ,op 2 ,...,op n ) And all tensor data constitute a set { tensor', including input tensor data and output tensor data of the entire neural network model and intermediate result tensor data between all operators 0 ,tensor 1 ,...,tensor n },op i Is the input of tenor i-1 The output is tensor i . To pairEach tensor data tensor i Having a state set S corresponding thereto i The objective task of the recurrent neural network is to find a mapping relationship tensor between the tensor data itself and a state of its state set i →s i The splitting mode of all tensor data can be determined by determining a specific state for each tensor in the circular neural network model, and therefore, a mapping relation of all tensor data in a neural network model to the splitting state is called a splitting scheme P of the neural network model.
Specifically, after tensor data in a neural network model and associated information of the tensor data are input into a cyclic neural network model, a first processor runs a correlation algorithm of the cyclic neural network model to process the input tensor data and the associated information, outputs a current splitting strategy, and updates parameters of the cyclic neural network model according to the current splitting strategy.
Specifically, the first processor 110 samples probability distributions of different splitting states of tensor data in the current splitting strategy, so as to obtain a splitting scheme. And then, updating the parameters of the recurrent neural network model according to the splitting scheme. Further, a splitting scheme for the neural network is obtained by sampling the current splitting strategy. And splitting the network according to the scheme and executing the split network on the multi-core accelerator in parallel to obtain the execution time of the scheme. And updating the parameters in the recurrent neural network by using the execution time of all the splitting schemes obtained by the current splitting strategy.
Further, the input of the recurrent neural network model is a feature vector set containing all tensor data and relevant information thereof, and the output is a splitting strategy of each tensor data. The current splitting strategy output by the recurrent neural network model for the first time is a randomly generated splitting strategy, then the model is updated in each round according to the time of executing the splitting strategy generated in the previous round on the second processor, the updated model is used for entering the next round, and the whole process consists of one round of 'updating-testing' process until a better splitting strategy aiming at the current model is obtained. After the current splitting strategy is obtained, sampling is carried out for n (n is a positive integer, for example 100) times according to the current strategy, each sampling can obtain a set of complete splitting scheme of the whole network containing the splitting modes of all tensor data in the neural network, and parameters in the cyclic neural network are updated according to the execution time of the splitting scheme. Optionally, when the parameter of two adjacent updates is smaller than a preset threshold, the neural network of the recurrent neural network model is determined to be converged. And then, obtaining a target splitting scheme by the splitting strategy output by the recurrent neural network model.
Specifically, after a splitting scheme is obtained by sampling probability distribution of different splitting states of each tensor data, the number of splitting segments of each tensor data in each dimension in the neural network is determined according to the splitting scheme, and the positions of the starting point and the end point of each splitting segment of each tensor data in each dimension are deduced one by one from the front direction and the back direction by combining the structural information of the neural network model. And obtaining a specific splitting scheme of the neural network model.
And step S204, obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.
Specifically, the first processor 110 obtains a target splitting scheme according to the target splitting policy, and splits the neural network model according to the target splitting scheme. More specifically, a target splitting strategy is sampled to obtain a target splitting scheme, and the neural network model is split according to the target splitting scheme. And deducing the positions of the starting point and the ending point of each split section of each tensor data in each dimension one by one from the front to the back by combining the structural information of the neural network model according to the number of the split sections of each tensor data in each dimension in the neural network model determined by a target splitting scheme. And obtaining a specific target splitting scheme of the neural network model.
In the neural network model splitting method in this embodiment, a target splitting strategy is obtained through a recurrent neural network based on an attention mechanism, and a specific target splitting scheme is obtained by sampling according to probability distributions of different splitting schemes provided by the target splitting strategy. The target splitting scheme is a splitting scheme for the whole neural network model obtained by comprehensively considering the information of all tensor data in the neural network model to be processed and the relationship among the tensor data, and then the neural network model is split according to the splitting scheme, so that the neural network model is processed in parallel. The method can reasonably distribute the calculation load to a plurality of cores of the accelerator when the neural network processes the input tensor data with any batch size, effectively reduces the time delay of the neural network for processing the data, and only needs a high-performance neural network calculation library on a single core instead of a more complex neural network calculation library executed on a plurality of cores.
In another embodiment, as shown in fig. 4, step S202 includes:
s2021, according to probability distributions of different splitting states of each tensor data in the current splitting policy, sampling the different splitting states of the tensor data in the neural network model n times to obtain n splitting schemes; wherein n is a positive integer.
Specifically, the first processor 110 samples the probability distribution of the different splitting states of the tensor data in the neural network model for n times according to the probability distribution of the different splitting states of each tensor data in the current splitting strategy, so as to obtain n splitting schemes.
S2022, calculating an execution time of each of the n splitting schemes.
Specifically, the first processor 110 calculates the execution time of each of the n splitting schemes.
S2023, updating the parameters of the recurrent neural network model according to the execution time of each splitting scheme.
Specifically, the first processor 110 updates the parameters of the recurrent neural network model according to the execution time of each splitting scheme.
In one embodiment, S2022, comprises: determining the calculation load and the memory access data amount of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.
Specifically, step S203 may include: the first processor divides each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.
In one embodiment, as shown in fig. 5, step S203 includes:
s2031, determining a splitting start point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data.
Specifically, after the number of split segments in each dimension of each tensor data in the neural network is obtained by sampling the current splitting strategy, the starting point and the end point of each segment of the input tensor data of the neural network in each dimension are determined according to the principle of equal division as much as possible, so that the splitting mode of the input tensor data is obtained.
S2032, segmenting each tensor data according to the splitting start point and the splitting end point.
Further, traversing all operators according to the structure of the neural network model, and after the splitting mode of the input tensor data of the operator is determined, determining the starting point and the end point of each section of the output tensor data in each dimension by using the operator, the splitting mode of the input tensor data of the operator and the number of the splitting sections in each dimension of the output tensor data of the operator, so as to obtain the splitting mode of the output data. And after the traversal is completed, all tensor data in the neural network have a determined splitting mode.
The splitting method in this embodiment can change tensor data from a current splitting state to a target splitting state.
It should be noted that, if the sub-tensor data obtained after splitting needs to be processed by an accelerator of the processing neural network, the accelerator usually needs data of a certain scale to ensure that higher calculation efficiency is obtained. In order to calculate throughput, the accelerator adopts a large number of long-bit-width vector calculation units, and if the dimension is too small, the bit width of vector calculation cannot be filled, so that a large number of calculation units are in an unused state, and the calculation efficiency on an accelerator core is greatly influenced. At this time, it is necessary to ensure that the tensor data cannot be over-split in a certain dimension.
When the tensor data in the neural network are split and the neural network is processed in parallel, if the split tensor data are too small, or some operators (for example, a Local Response Normalization operator, LRN for short) need to use data adjacent to the sub-tensor data during operation, at this time, the compensation operator can be used to compensate the input tensor data of the corresponding operator, so that the corresponding operator can normally acquire the input tensor data and perform corresponding operations. The compensation operator can perform a compensation operation on the data, and specifically, the compensation operator can compensate the input tensor data by inserting the compensation operator between a certain operator and the input tensor data. Alternatively, if the input tensor data of a certain operator is split, the compensation operator may also compensate the corresponding sub-input tensor data (obtained by splitting the input tensor data).
For example, convolution operators in neural network models, in some cases require additional auxiliary operators to complete the splitting task. When dividing the calculation according to the H/W dimension of the input tensor data, if the size of the convolution kernel window exceeds the step length of each movement of the convolution kernel window, namely, the kernel > stride, the condition that a frame moves to the boundary and exceeds the boundary of a sub data block occurs in the calculation of the divided convolution operator, and the missing part of data is located on an adjacent sub data block. As shown in fig. 6, the compensation operator can be used to read adjacent data except for one sub-tensor data from the storage location of other sub-tensor data, and combine the adjacent data with the original data to form a larger data block, where the moving range of the window in the calculation stage does not exceed the boundary of the compensated data block.
Besides the convolution operator, the pooling operator and the Local Response Normalization operator (LRN) which is not very common at present also have the problem that the split subtask depends on data on adjacent data blocks, the pooling operator is similar to the convolution operator, mainly caused by the fact that a pooling window is larger than the moving step length of the window, but the Local Response Normalization operator is different, and the calculation logic is that, in order to calculate the result of one point of output data on the C dimension, the value of one point corresponding to tensor data on the C dimension and the value of k/2 points adjacent to the left and right points need to be input. Thus, if the calculation of the local response normalization operator is split into multiple LRN operators according to the C-dimension, each new operator also requires data from adjacent data blocks to calculate values located on the C-dimension boundaries.
Based on the characteristic that the compensation operator can compensate the input tensor data or the sub-input tensor data obtained by splitting the input tensor data, in one embodiment of the application, a scheme for optimizing the neural network model splitting method by using the compensation operator is provided. The scheme is as follows:
before step S201, a compensation operator is inserted between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data. Optionally, the type of the ith operator includes: one of a convolution operator, a local response normalization operator, and a pooling operator.
In one embodiment, the compensation operator compensates the input tensor data of the corresponding operator, including: compensating, with the compensation operator, each sub-input tensor data in the original input tensor data of the ith operator according to adjacent sub-input tensor data of each sub-input tensor data in the original input tensor data of the ith operator. Further, a compensation operator determines a compensation parameter according to the type and scale of the ith operator, wherein the compensation parameter is used for determining the position of compensation data on adjacent tensor data of the sub-tensor data; reading required compensation data of each sub tensor data from adjacent sub input tensor data of each sub input tensor data in the original input tensor data of the ith operator according to the compensation operator and the compensation parameter; and combining the compensation data of each sub tensor data and the compensation data thereof.
Further, if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
Operator Fusion (Kernel Fusion) is a commonly used technique in current neural network model optimization. The traditional neural network model optimization on the GPU or the CPU is usually aimed at the internal calculation logic of each operator, and the execution performance of a single operator is improved by increasing the vectorization and improving the utilization rate of on-chip storage. The optimization is limited by an optimization space provided by an operator, and certain limitation exists, while the operator fusion is realized by fusing a plurality of operators in the neural network into a new operator, so that the gap between the operators is broken through, on one hand, the cost for starting each operator (Kernel Launch) can be reduced, and on the other hand, the access and storage cost of intermediate results among some unnecessary operators can be saved. The most common operator fusion is the case of combining convolution, pooling or other common operators with several successive pair-wise computation operators behind it into a new operator. The contraposition calculation operator generally refers to a common operator in a neural network, and is characterized in that input tensor data and output data have the same shape, and a value in the output data is calculated only by inputting a value in a position corresponding to the input tensor data, and the operator comprises Relu and other activating operators, scale, batchNorm and the like. By fusing the alignment calculation operator with any operator in front of the alignment calculation operator, a large amount of memory access overhead of intermediate results can be avoided. In the calculation process of the non-alignment calculation operator at the forefront of the sequence, when the numerical value of one point is calculated, the subsequent alignment calculation can be immediately carried out on the numerical value to obtain the final result of the whole operator sequence, but the current output is not stored into the storage, and the subsequent alignment calculation operator reads out the final result from the storage again for calculation.
In order to reduce the number of tensor data in the neural network model and thus accelerate the training decision speed of each network, the neural network model to be processed is preprocessed before the tensor data in the neural network model and the associated information of the tensor data are input into the cyclic neural network model. In the preprocessing process, the counterpoint calculation operator is merged into the previous operator, and correspondingly, intermediate results among the operators are removed from the calculation graph, because the data become temporary data inside the operators at the moment, the number of data blocks in the whole network is effectively reduced, and the space to be decided is reduced. On the other hand, the method ensures that in the finally obtained splitting scheme, the splitting mode of the alignment calculation operators is always consistent with the input tensor data, which means that in the split calculation graph, the sub-alignment calculation operators can also be optimized by using an operator fusion mode.
The specific implementation method of the fusion operator scheme comprises the following steps: before step S201, fusing the operator type in the neural network model to be processed with the plurality of operators in the neural network model to obtain a fusion operator, and replacing the plurality of operators with the fusion operator to obtain the neural network model.
The computational scale of each layer of the neural network model continuously changes along with the extension of the network. For a general classification network, convolution of the first few layers is usually large in dimension of the size H/W of the feature image, and is small in dimension of the number of feature images, i.e. C, and considering the calculation scale necessary for ensuring the efficiency of the accelerator, the splitting strategies of the first few layers should be inclined to splitting in the feature image, i.e. H/W dimension. As the calculation progresses layer by layer, the size of the feature image gradually decreases, and the number of the feature images continuously increases, and the splitting of the convolution operator should be inclined to the C dimension. Along with the change of the splitting trend of the neural network model, the splitting mode of the operator needs to be adjusted correspondingly, namely the state of an intermediate result is adjusted, and how to properly insert the glue operator in the whole network to improve the network splitting performance will be described next.
The specific implementation in this scheme may be: before step S201, a glue operator is inserted between each operator in the neural network model to be processed and the input tensor data of the operator, so as to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
It should be noted that the neural network model is obtained by adjusting the neural network to be processed through the compensation operator, the operator fusion and the glue operator, the fusion operator and the compensation operator in the neural network model are all processed the same as the common neural network model operator.
It should be understood that although the steps in the flowcharts of fig. 2, 4-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 7, there is provided a neural network model splitting apparatus, including:
the splitting decision module 210 inputs tensor data in the neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model and updating the parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;
and the splitting executing module 220 obtains a target splitting scheme according to the target splitting strategy, and splits the neural network model according to the target splitting scheme.
In one embodiment, the splitting decision module 210 is configured to sample, for n times, probability distributions of different splitting states of tensor data in the neural network model according to probability distributions of different splitting states of each tensor data in the current splitting policy, so as to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating the parameters of the recurrent neural network model according to the execution time of each splitting scheme.
In one embodiment, the splitting decision module 210 is configured to determine a calculation load and an access data amount of an operator corresponding to each tensor data according to types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the access data volume, the access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.
In one embodiment, the splitting executing module 220 is configured to segment each tensor data according to the splitting dimension and the splitting number of each tensor data in the target splitting scheme, so as to obtain sub-tensor data.
In one embodiment, the splitting executing module 220 is configured to determine a splitting start point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.
Specifically, when tensor data is split, a current state of the tensor data, an associated operator and a target splitting state are determined, a splitting starting point and a splitting end point of the tensor data on each splitting dimension are determined, the current state comprises the splitting number of the tensor data on each splitting dimension, and the target splitting state comprises the splitting number of the tensor data on each splitting dimension.
In one embodiment, as shown in fig. 8, the neural network model splitting apparatus further includes: the neural network optimization module 230 is configured to insert a glue operator between each operator in the to-be-processed neural network model and the input tensor data of the operator to obtain a neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
In one embodiment, the neural network optimization module 230 is further configured to insert a compensation operator between an ith operator of the neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.
In one embodiment, the neural network optimization module 230 is specifically configured to, if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determine a compensation parameter of a combined compensation operator according to a preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
In one embodiment, the neural network optimization module 230 is further configured to fuse multiple operators in the neural network model according to the types of the operators in the neural network model to be processed to obtain a fusion operator, and replace the multiple operators with the fusion operator to obtain the neural network model.
For specific definition of the neural network model splitting device, reference may be made to the definition of the neural network model splitting method above, and details are not described here. All or part of each module in the neural network model splitting device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device comprises a processor, a memory and a network interface which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a neural network model splitting method.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor implementing the following steps when executing the computer program: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.
In one embodiment, the processor when executing the computer program implements the steps of: according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.
In one embodiment, the processor when executing the computer program implements the steps of: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.
In one embodiment, the processor, when executing the computer program, performs the steps of: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.
In one embodiment, the processor when executing the computer program further performs the steps of: before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further includes: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
In one embodiment, the processor when executing the computer program further performs the steps of: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.
In one embodiment, the processor, when executing the computer program, further performs the steps of: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
In one embodiment, the processor when executing the computer program implements the steps of: fusing the operators in the neural network model according to the type of the operator in the neural network model to be processed to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the steps of obtaining a current splitting strategy by using the recurrent neural network model, and updating parameters of the recurrent neural network model according to the current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.
In one embodiment, the computer program when executed by the processor further performs the steps of: according to the probability distribution of different splitting states of each tensor data in the current splitting strategy, sampling the probability distribution of the different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer; calculating an execution time of each of the n splitting schemes; and updating parameters of the recurrent neural network model according to the execution time of each splitting scheme.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining the calculation load and the memory access data amount of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; and calculating the execution time of the splitting scheme according to the calculation load, the memory access data volume, the memory access bandwidth of the execution processor of the operator and the calculation throughput rate of the execution processor of the operator.
In one embodiment, the computer program when executed by the processor further performs the steps of: and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data; and segmenting each tensor data according to the splitting starting point and the splitting end point.
In one embodiment, the computer program when executed by the processor further performs the steps of: before inputting tensor data in the neural network and associated information of the tensor data into a recurrent neural network model, the method further comprises: inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
In one embodiment, the computer program when executed by the processor further performs the steps of: inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.
In one embodiment, the computer program when executed by the processor further performs the steps of: if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator; and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
In one embodiment, the computer program when executed by the processor further performs the steps of: according to the type of an operator in a neural network model to be processed, fusing the operators in the neural network to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.
There is also provided, in one embodiment, a computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on the processor, the first processor, when executing the computer program, being configured to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model, and obtaining a current splitting strategy by using the cyclic neural network model; iteratively executing the step of updating the parameters of the recurrent neural network model using a current splitting strategy until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting scheme; splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.
In an optional implementation, the second processor comprises a multi-core accelerator; the multi-core accelerator divides each tensor data according to the splitting dimension and the splitting number of each tensor data by using the first processor to obtain sub-tensor data; and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the plurality of accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.
Further, the first processor is used for splitting all tensor data in the network according to the splitting dimensionality and the splitting quantity of each tensor data and by combining a network structure, and a sub-tensor data set of each tensor data is obtained. The splitting of the operators associated with the tensor data is realized while the tensor data is split, and the sub-operator set of each operator is obtained. And then distributing the sub-operators in the sub-operator set of each operator to a plurality of acceleration cores of the multi-core accelerator, reading sub-tensor data corresponding to the sub-operators from the storage by the accelerator core as input, and executing the sub-operators.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (13)

1. A neural network model splitting method is characterized by comprising the following steps:
tensor data in a neural network model and associated information of the tensor data are input into a circular neural network model;
iteratively executing the cyclic neural network model to obtain a current splitting strategy, and according to the probability distribution of different splitting states of each tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer;
calculating an execution time of each of the n splitting schemes; calculating the execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme, wherein the calculating the execution time of the splitting scheme comprises the following steps: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the memory access data amount;
updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges;
then, the recurrent neural network model outputs a target splitting strategy;
and obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme.
2. The method according to claim 1, wherein splitting the neural network model according to the target splitting scheme comprises:
and segmenting each tensor data according to the splitting dimensionality and the splitting number of each tensor data in the target splitting scheme to obtain sub-tensor data.
3. The method of claim 2, wherein the segmenting each tensor data according to the splitting dimension and the splitting number of each tensor data in the target splitting scheme comprises:
determining a splitting starting point and a splitting end point on a corresponding splitting dimension according to the number of splitting segments on each dimension of each tensor data;
and segmenting each tensor data according to the splitting starting point and the splitting end point.
4. The method of claim 3, wherein the split dimension comprises one or more of a batch size, a number of feature images, a height of a feature image, a width of a feature image.
5. The method of claim 1, wherein before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further comprises:
inserting glue operators between each operator in a neural network model to be processed and input tensor data of the operators to obtain the neural network model; wherein the glue operator is used for adjusting the splitting state of the input tensor data of the operator.
6. The method of any one of claims 1-5, wherein before inputting the tensor data in the neural network model and the associated information of the tensor data into the recurrent neural network model, the method further comprises:
inserting a compensation operator between an ith operator of a neural network model to be processed and input tensor data of the ith operator to obtain the neural network model; wherein i is a positive integer, and the compensation operator is used for compensating the corresponding input tensor data.
7. The method of claim 6, wherein inserting a compensation operator between an ith operator of the neural network model to be processed and input tensor data of the ith operator to obtain the neural network model comprises:
if a plurality of compensation operators are inserted between the ith operator and the (i + k) th operator, determining compensation parameters of the combined compensation operator according to the preset size of each sub tensor data in the input tensor data of the (i + k) th operator;
and inserting a combination compensation operator between the input tensor data of the ith operator and the ith operator, and configuring the combination compensation operator by using the compensation parameters of the combination compensation operator to obtain a neural network model.
8. The method according to any one of claims 1-5, wherein before the inputting tensor data in the neural network model and associated information of the tensor data into the recurrent neural network model, the method further comprises:
according to the type of an operator in the neural network model to be processed, fusing the operators in the neural network model to be processed to obtain a fusion operator, and replacing the operators with the fusion operator to obtain the neural network model.
9. A neural network model splitting device, comprising:
the splitting decision module is used for inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing the cyclic neural network model to obtain a current splitting strategy, and according to the probability distribution of different splitting states of tensor data in the current splitting strategy, sampling the probability distribution of different splitting states of the tensor data in the neural network for n times to obtain n splitting schemes; wherein n is a positive integer;
determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the memory access data amount;
updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy;
and the splitting execution module is used for obtaining a target splitting scheme according to the target splitting strategy and splitting the neural network model according to the target splitting scheme.
10. A computer arrangement comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the method of any of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
12. A computer system comprising a memory, a first processor and a second processor, the memory having stored thereon a computer program operable on the processor, wherein the first processor, when executing the computer program, is operable to: inputting tensor data in a neural network model and associated information of the tensor data into a cyclic neural network model; iteratively executing to obtain a current splitting strategy by using the cyclic neural network model, and sampling the probability distribution of different splitting states of tensor data in the neural network for n times according to the probability distribution of different splitting states of each tensor data in the current splitting strategy to obtain n splitting schemes; wherein n is a positive integer;
calculating an execution time of each of the n splitting schemes; calculating the execution time of the splitting scheme according to the splitting states of all tensor data in the splitting scheme, wherein the calculating the execution time of the splitting scheme comprises the following steps: determining the calculation load and the memory access data quantity of the operator corresponding to each tensor data according to the types and scales of operators associated with all tensor data in the splitting scheme; calculating the execution time of the splitting scheme according to the calculation load and the access data volume;
updating parameters of the recurrent neural network model according to the execution time of each splitting scheme until the neural network of the recurrent neural network model converges; then, the recurrent neural network model outputs a target splitting strategy; obtaining a target splitting scheme according to the target splitting strategy, and splitting the neural network model according to the target splitting scheme; the second processor, when executing the computer program, is configured to: and processing the split neural network model in parallel.
13. The computer system of claim 12, wherein the second processor comprises a multi-core accelerator;
dividing each tensor data by using the first processor according to the splitting dimension and the splitting number of each tensor data to obtain sub-tensor data;
and distributing the sub tensor data to a plurality of accelerator cores in the multi-core accelerator, wherein the plurality of accelerator cores perform parallel processing on the sub tensor data according to corresponding operators.
CN201910114831.4A 2019-02-14 2019-02-14 Neural network model splitting method, device, storage medium and computer system Active CN111562977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910114831.4A CN111562977B (en) 2019-02-14 2019-02-14 Neural network model splitting method, device, storage medium and computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910114831.4A CN111562977B (en) 2019-02-14 2019-02-14 Neural network model splitting method, device, storage medium and computer system

Publications (2)

Publication Number Publication Date
CN111562977A CN111562977A (en) 2020-08-21
CN111562977B true CN111562977B (en) 2022-12-09

Family

ID=72071332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910114831.4A Active CN111562977B (en) 2019-02-14 2019-02-14 Neural network model splitting method, device, storage medium and computer system

Country Status (1)

Country Link
CN (1) CN111562977B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052758B (en) * 2020-08-25 2023-05-23 西安电子科技大学 Hyperspectral image classification method based on attention mechanism and cyclic neural network
TWI777481B (en) * 2021-04-08 2022-09-11 鴻海精密工業股份有限公司 Data control method, data processing method, electronic equipment and storage medium
CN113342345A (en) * 2021-05-17 2021-09-03 北京百度网讯科技有限公司 Operator fusion method and device of deep learning framework
CN114237918B (en) * 2022-02-28 2022-05-27 之江实验室 Graph execution method and device for neural network model calculation
CN114707643A (en) * 2022-04-11 2022-07-05 华为技术有限公司 Model segmentation method and related equipment thereof
WO2023222047A1 (en) * 2022-05-17 2023-11-23 北京灵汐科技有限公司 Processing method and processing unit for neural network computing graph, and device and medium
CN115879504B (en) * 2022-12-30 2023-08-29 珠海市欧冶半导体有限公司 Device and method for splitting and quantizing layerrnorm operator
CN117707791B (en) * 2024-02-02 2024-05-14 北京壁仞科技开发有限公司 Method, apparatus and storage medium for performing attention calculations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651877A (en) * 2016-12-20 2017-05-10 北京旷视科技有限公司 Example segmenting method and device
CN107145939A (en) * 2017-06-21 2017-09-08 北京图森未来科技有限公司 A kind of Neural network optimization and device
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment
CN109272108A (en) * 2018-08-22 2019-01-25 深圳市亚博智能科技有限公司 Control method for movement, system and computer equipment based on neural network algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060724A1 (en) * 2016-08-25 2018-03-01 Microsoft Technology Licensing, Llc Network Morphism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651877A (en) * 2016-12-20 2017-05-10 北京旷视科技有限公司 Example segmenting method and device
CN107145939A (en) * 2017-06-21 2017-09-08 北京图森未来科技有限公司 A kind of Neural network optimization and device
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment
CN109272108A (en) * 2018-08-22 2019-01-25 深圳市亚博智能科技有限公司 Control method for movement, system and computer equipment based on neural network algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法;陆维娜等;《计算机辅助设计与图形学学报》;20181115(第11期);全文 *

Also Published As

Publication number Publication date
CN111562977A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN111562977B (en) Neural network model splitting method, device, storage medium and computer system
US11694073B2 (en) Method and apparatus for generating fixed point neural network
JP7462623B2 (en) System and method for accelerating and embedding neural networks using activity sparsification
CN111126668B (en) Spark operation time prediction method and device based on graph convolution network
CN110824587B (en) Image prediction method, image prediction device, computer equipment and storage medium
CN113343545B (en) Structure adaptive optimization design method, device, equipment and medium
KR102140996B1 (en) Method and device for neural architecture search optimized for binary neural network
KR20210032266A (en) Electronic device and Method for controlling the electronic device thereof
KR20210032140A (en) Method and apparatus for performing pruning of neural network
CN110555514A (en) Neural network model searching method, image identification method and device
CN112823362A (en) Hyper-parameter adjustment method, device, and program
CN113919484A (en) Structured pruning method and device based on deep convolutional neural network model
EP3926546A2 (en) Neural network model splitting method, apparatus, computer device and storage medium
CN111563584B (en) Splitting method of neural network model and related product
CN114968612B (en) Data processing method, system and related equipment
CN113222014A (en) Image classification model training method and device, computer equipment and storage medium
CN109753384B (en) Cloud host snapshot backup method and device, computer equipment and storage medium
CN111563586B (en) Splitting method of neural network model and related product
US11315036B2 (en) Prediction for time series data using a space partitioning data structure
CN111898752A (en) Apparatus and method for performing LSTM neural network operations
US11410036B2 (en) Arithmetic processing apparatus, control method, and non-transitory computer-readable recording medium having stored therein control program
CN115688867A (en) Method, apparatus, device and storage medium for training neural network
CN110533158B (en) Model construction method, system and non-volatile computer readable recording medium
WO2021226709A1 (en) Neural architecture search with imitation learning
JP7330313B2 (en) Arithmetic device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant