CN114492772A - Neural network tensor shape tracking method and computing platform - Google Patents

Neural network tensor shape tracking method and computing platform Download PDF

Info

Publication number
CN114492772A
CN114492772A CN202111353668.0A CN202111353668A CN114492772A CN 114492772 A CN114492772 A CN 114492772A CN 202111353668 A CN202111353668 A CN 202111353668A CN 114492772 A CN114492772 A CN 114492772A
Authority
CN
China
Prior art keywords
shape
tensor
neural network
graph
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111353668.0A
Other languages
Chinese (zh)
Inventor
刘晨
陆杰
谭锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd, Alibaba Cloud Computing Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111353668.0A priority Critical patent/CN114492772A/en
Publication of CN114492772A publication Critical patent/CN114492772A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

Provided are a neural network tensor shape tracking method and a computing platform. The method comprises the following steps: constructing a shape flow graph based on IR in the middle of a computation graph of a neural network, wherein the shape flow graph comprises nodes for performing computation operation of the neural network and edges representing the dependency relationship among the nodes, and the edges are marked with shape information of tensors to flow along the edges; extracting constraint conditions aiming at tensor shape information from node operations and newly introduced operands of the shape flow graph; determining whether there is an error or limitation in the shape information of the tensor based on the constraint condition; and reporting when it is determined that there is an error or limitation in the shape information of the tensor. According to the invention, by constructing the shape flow diagram based on the existing calculation diagram, the constraint condition of the shape is obtained, and the error or the limitation is found according to the constraint condition, and the shape error which can be found only in the actual calculation operation due to error termination can be found through pre-detection, so that unnecessary transmission, loading and calculation resource consumption of a calculation platform are avoided.

Description

Neural network tensor shape tracking method and computing platform
Technical Field
The disclosure relates to the field of deep learning, and in particular relates to a neural network tensor shape tracking method and a computing platform.
Background
Artificial intelligence is rapidly developed in recent years, and the method has good application effects in the fields of image classification, detection, video and voice processing and the like, and still has great development prospects. The neural network is the core of artificial intelligence application, and the deep learning neural network algorithm is one of the most common neural network models. The workload characteristics of neural networks are computational and data intensive. The multiplication and addition operation required by the neural network calculation is usually in the order of G, for example, the calculation amount of the target detection type neural network SSD is 120G operation times. The parameters required for training and subsequent reasoning for training the neural network are typically in the order of M to hundreds of megabytes, e.g. the parameters for the classification neural network VGG are up to 480 megabytes. Since ordinary computers are often unable to provide such high computational power and are equipped with sufficient memory, more and more neural network training and derivation is performed on specialized neural network platforms.
In order to perform calculations on a neural network platform, a user needs to transmit data, training data, etc. capable of generating a calculation graph to the platform. In actual neural network calculations, feature values are passed layer by layer between multiple layers of the neural network. If the shape of the feature value to be transmitted and the feature value to be received in the next layer do not match, this results in the computation being terminated. In the prior art, the shape mismatch cannot be detected in a pre-check operation, thereby causing significant waste of transmission, storage, and computational resources.
To this end, the present invention requires an improved neural network computation scheme.
Disclosure of Invention
One technical problem to be solved by the present disclosure is to provide an improved neural network tensor shape tracking scheme, which obtains constraint conditions for eigenvalue shape flow by constructing a shape flow graph based on an existing computation graph and finds shape errors or limitations based on the constraint conditions, so that shape errors that can only be found before in actual computation operation due to operation termination can be found by pre-inspection, thereby avoiding unnecessary transmission, storage and computation resource consumption.
According to a first aspect of the present disclosure, there is provided a neural network tensor shape tracking method, comprising: constructing a shape flow graph based on a computational graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computational operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges; extracting constraint conditions aiming at tensor shape information from node operations and newly introduced operands of the shape flow graph; determining whether there is an error or limitation in the shape information of the tensor based on the constraint condition; and reporting when it is determined that there is an error or limitation in the shape information of the tensor.
Optionally, unknown information exists in the added shape information of the corresponding tensor, and extracting constraints for the tensor shape information from the node operations and the newly introduced operands of the shape flow graph includes: listing a plurality of constraints for the unknown information based on each node operation and operand relationship to the unknown information of the shape flow graph; and solving for unknown information based on the plurality of constraints: if the unknown information has no solution, judging that a shape error exists; if a particular solution exists for the unknown information, then a shape constraint is determined to exist.
Optionally, the constructing a shape flow graph comprises: slicing backward from input tensors represented in the middle of the computation graph to build the shape flow graph, and collecting the operands required for node operations during the slicing backward, the operands comprising constants and scalar variables.
Optionally, the constructing a shape flow graph comprises: adding a return value of the call node operation to the shape flow graph in the backward slicing as a new node, and continuing the backward slicing after the newly added node.
Optionally, the constructing a shape flow graph comprises: copying all previous slices of the junction node having n incoming values to n; and selecting a different one of the n incoming values for each copied previous slice to continue slicing, constructing a shape flow graph for each path in the computation graph.
Optionally, the reporting when it is determined that there is an error or limitation in the shape information of the tensor includes: deleting node transformation operation and constraint conditions introduced by the node transformation operation one by one; and locating the node operation that caused the error or limitation when the remaining constraints are all satisfied.
The tensor shape tracking method makes this determination with virtually no eigenvalue flow.
According to a second aspect of the present disclosure, there is provided a neural network computing platform comprising: a tensor shape tracking module comprising: a building sub-module for building a shape flow graph based on a computational graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computational operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges; a constraint judging submodule for extracting a constraint condition for tensor shape information from a node operation and a newly introduced operand of the shape flow graph and judging whether there is an error or a limitation in the shape information of the tensor based on the constraint condition; the reporting submodule is used for reporting when the shape information of the tensor is judged to have errors or limitations; and a neural network training module for performing a neural network training calculation based on the computational graph intermediate representation from which the errors or limitations are eliminated according to the operation of the tensor shape tracking module.
Optionally, the computing platform further comprises: and the training data acquisition module is used for acquiring training data and training labels, the training data is used as the initial tensor of the flow tensor in the computational graph, and the training labels are used for performing reverse parameter adjustment on the classification result output by the computational graph, wherein the training data and the training labels are loaded by the neural network training module to execute neural network training calculation in response to the tensor shape tracking module judging that no error or limitation exists.
According to a third aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of the first aspect.
The invention provides a shape tracking method which is particularly suitable for calculating unknown tensor shapes in a graph. The method symbolizes the tensor shape by introducing a specific sign to the tensor shape of unknown rank or unknown dimension size. Constraints can be introduced from tensor operators, scalar variables and conditional branches. Finally, the satisfiability of these constraints is checked and reported. Therefore, the shape problem can be found out through pre-detection before the neural network calculation is actually carried out, and the waste of transmission, access and calculation resources caused by unnecessary calculation is avoided.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
Fig. 1 shows an example of the composition of a typical CNN.
Fig. 2 shows an example of a computation graph of CNN network training constructed based on a depth computation framework.
Fig. 3 shows the input feature map as an example of a tensor.
FIG. 4 shows a schematic diagram of a neural network computing system to which the present invention is applied.
Figure 5 shows a schematic flow diagram of a tensor shape tracking method for a neural network according to one embodiment of the invention.
Figure 6 illustrates an implementation of a neural network tensor shape tracker in accordance with one embodiment of the present invention.
Fig. 7 illustrates one example of a shape flow diagram constructed in accordance with the present invention.
Fig. 8 illustrates another example of a shape flow diagram constructed in accordance with the present invention.
FIG. 9 illustrates a schematic diagram of the components of a neural network computing platform, in accordance with one embodiment of the present invention.
FIG. 10 is a schematic diagram of a computing device that can be used to implement the shape tracking method described above according to an embodiment of the invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Artificial intelligence is rapidly developed in recent years, and the method has good application effects in the fields of image classification, detection, video and voice processing and the like, and still has great development prospects. The neural network is the core of artificial intelligence application, and the deep learning neural network algorithm is one of the most common neural network models. The workload characteristics of neural networks are computational and data intensive. The multiplication and addition operation required by the neural network calculation is usually in the order of G, for example, the calculation amount of the target detection type neural network SSD is 120G operation times. The parameters required for training and subsequent reasoning for training the neural network are then typically in the order of M to hundreds of megabytes, for example the parameters for the classification neural network VGG are 480 megabytes.
Common Artificial Neural Networks (ANN) include Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN). CNN is a kind of artificial neural network, and has become a research hotspot in the field of current speech analysis and image recognition. The weight sharing network structure of the system is more similar to a biological neural network, the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional networks are a multi-layered perceptron specifically designed to recognize two-dimensional shapes, the structure of which is highly invariant to translation, scaling, tilting, or other forms of deformation. The neural network computations and their related concepts are described with some degree of background below in particular in connection with the accompanying figures.
CNN basic concept
Fig. 1 shows an example of the composition of a typical CNN. As shown in fig. 1, a typical CNN consists of a series of layers that run in order.
The parameters of the CNN model are called "weights" (weights). The first layer of CNN reads the input map and outputs a series of feature maps (featuremaps). The lower layer reads the feature map generated by the upper layer and outputs a new feature map. The last classifier (classifier) outputs the probability that the input graph may belong to a certain class. The CONV layer (convolutional layer) and the FC layer (full link layer) are two basic layer types in CNN. After the CONV layer, there is usually a pooling layer (Poolinglayers).
In the present application, for one CNN layer,
Figure BDA0003356701010000041
a jth input feature map is shown,
Figure BDA0003356701010000042
representing the ith output characteristic diagram, biThe offset term of the ith output plot is shown.
For the CONV layer, ninAnd noutRepresenting the number of input and output profiles, respectively.
For FC layer, ninAnd noutRepresenting the length of the input and output feature vectors, respectively.
Definition of CONV layers (Convolutional layers): the CONV layer takes a series of feature maps as input and obtains an output feature map by convolution kernel convolution.
A non-linear layer, i.e. a non-linear excitation function, usually connected to the CONV layer, is applied to each element in the output signature. The excitation function used is typically a ReLU function, which layer is also commonly referred to as the ReLU layer.
The CONV layer may be represented by expression 1:
(1)
Figure BDA0003356701010000043
wherein g isi,jIs a convolution kernel applied to the jth input feature map and the ith output feature map. Definition of FC layer (Fully-connected layers): the FC layer applies a linear transformation of the input features upwards:
(2)fout=Wfin+b
w is an integer nout×ninTransform matrix, b is a bias term. It is worth noting that for the FC layer, what is input is not a combination of several two-dimensional feature maps, but one feature direction. Thus, in expression 2, the parameter ninAnd noutIn effect corresponding to the length of the input and output feature vectors.
Pooling (pooling) layer: usually connected to the CONV layer for outputting the maximum or average value of each partition (subarea) in each profile. The Pooling maximum value can be represented by expression 3:
(3)
Figure BDA0003356701010000044
where p is the size of the pooling kernel. This non-linear "down-sampling" not only reduces the size and computation of the feature map for the next layer, but also provides a translation invariance. CNN can be used for image classification in the forward inference process.
Before deploying inference (e.g., image classification) using CNN, the CNN needs to be trained first. Parameters, such as weights and biases, of the various layers of the neural network model are determined through a large import of training data.
Training of CNN
The training model represents the ideal values for learning (determining) all weights and biases by labeled samples. These determined weights and biases enable a high-accuracy inference of the input feature values, e.g. a correct classification of the input pictures, during the neural network deployment phase.
In supervised learning, a machine learning algorithm learns parameters by examining multiple samples and attempting to find a model that minimizes losses, a process known as empirical risk minimization.
The loss is a penalty for poor prediction. That is, the penalty may be a numerical value representing how accurate the model predicts for a single sample. If the prediction of the model is completely accurate, the penalty is zero, otherwise the penalty will be large. The goal of training the model is to find a set of weights and biases that are "less" lost on average from all samples.
In the training process of the neural network, in order to quantify whether the current weights and biases can fit the network inputs to all the network inputs, a loss function (cross _ entropy, as used in fig. 2 below) needs to be defined. Thus, the goal of training the network can be translated into a process that minimizes the loss function of weights and biases. Typically, a gradient descent algorithm (in multi-layer neural network training, a back propagation algorithm is used) is used to achieve the above-described minimization process.
In the back-propagation algorithm, a repetitive iterative process of forward propagation and back propagation is involved. The forward propagation process is a process in which neurons in the layers are connected through a weight matrix so that stimuli (eigenvalues) are continuously transmitted from the previous layer to the next layer through the excitation function of each layer. In the backward propagation, the error of the current layer needs to be reversely derived from the error of the next layer. Therefore, the weights and the bias are continuously adjusted through the iterative process of forward propagation and backward propagation, so that the loss function is gradually close to the minimum value, and the training of the neural network is completed.
Deep learning framework and computation graph
The deep learning framework provides a building block for the design, training and verification of the neural network through a high-level programming interface. In other words, the deep learning framework provides a way for the implementation of neural network specific algorithms (e.g., the neural network construct shown in fig. 1).
With the development of deep learning and neural network algorithms, many top-level deep learning frameworks such as TensorFlow and PyTorch appear for researchers and developers. Developers can use the DSL and API of these frameworks to design different computational graph models to perform specific tasks, such as face recognition, image detection, voice recognition, etc.
These computing frameworks vary greatly in their programming. Both compiled and scripted languages are variables that are computed step by step to obtain results. The TensorFlow and Pyorch are different, firstly, a calculation graph is constructed through programming, then, data is used as input, calculation is carried out through calculation operation specified by the calculation graph, and finally, a calculation result is obtained. Therefore, the method can get rid of the limitation of programming languages, help the front end and the back end to be decoupled and present a more intuitive imaging effect.
The computational graph model is composed of nodes (nodes) and edges (edges), wherein the nodes represent operators or operators, the edges represent dependencies between calculations, and the solid lines represent data transfer dependencies, and the transferred data are tensors.
Fig. 2 shows an example of a computation graph of CNN network training constructed based on a depth computation framework. The computational graph shown in fig. 2 may be obtained from CNN network training code obtained from TensorFlow programming. The illustrated computation graph fuses Pad or BiasAdd adjacent to the Conv2d (two-dimensional convolution) into the edge represented by the Conv2d, and fuses all constant nodes into the attribute of the edge corresponding to the corresponding operator, thereby constructing a directed acyclic graph.
When the neural network is trained, a batch of training samples need to be provided each time, and if the data selected in each iteration is represented by a constant, the computational graph of TensorFlow becomes extremely large. Because each time a constant is added, the TensorFlow adds a node to the computational graph, while the place (placeholder) can solve this, it only has a node that is a placeholder. The value of each used placeholder can be given by a feed _ fact (dictionary),
for this purpose, in the data input phase at the far left of the computation graph, the dictionary realizes the input of an initial training image by feeding the training image (train _ img) into the placeholder. In the prediction (predict) block, a two-dimensional convolution (conv2d) operation convolves the input data with a convolution kernel obtained by get _ variable, and by means of a reshaped operation makes the convolved data conform to the shape requirements of the various API calls within the deep learning frame. After a plurality of convolutional layers (the convolution operation in the prediction box may be repeated, i.e., there may be a plurality of convolution + pooling operations as shown in fig. 1), the eigenvalue is fed into the first fully-connected layer, and full-connected computation is performed using the vector obtained by get _ variable, and then the obtained eigenvalue is fed into the second fully-connected layer, and full-connected computation is performed using the vector obtained by get _ variable, and thus the computation result may be cross-entropy back-propagation (softmax _ cross _ entry _ with _ locations) based on softmax classification by the dictionary by feeding the training tag (train _ lab) into the placeholder. After a batch (batch) of input training images are trained for multiple times, each parameter (e.g., the value of the convolution kernel) converges to a relatively fixed distribution, so that the parameters can be used as the trained neural network parameters to predict new images.
FIG. 2 illustrates a dynamic computation graph, i.e., each time an operator is used, the operator is dynamically added to an implicit default computation graph and the resulting computation graph is immediately executed, thereby facilitating debugging and usage.
Tensor and shape
In depth-frame based CNN computation (including training and prediction), the eigenvalues flow one by one along the node operations in fig. 2, i.e., along the illustrated path conv2d → reshaped → matmul → matmul → softmax cross entry with locations. Here, the eigenvalues are generally multidimensional matrices, and may be referred to as tensors (tensors). This is also the origin of the depth frame TensorFlow name.
The tensor has a shape. Shape (shape) refers to the length (number of elements) of each axis of the tensor. Rank (rank) refers to the number of axes of the tensor. The scalar has a rank of 0, the vector has a rank of 1, and the matrix has a rank of 2. An axis or dimension may be used to refer to a particular dimension of the tensor. Size or dimension (size) may refer to the total number of terms of the tensor, i.e., the product shape vector.
The shaft is often referred to by an index. Axes are typically ordered in order from global to local: first the batch, then the spatial dimensions, and finally the features for each location. Thereby allowing the feature vectors to be located in contiguous regions in memory.
Fig. 3 shows the input feature map as an example of a tensor. The illustrated tensor is a 4 rank tensor, represented by [2,4,5,3] in shape and 60 in size (size), i.e., containing 60 elements. Specifically, the tensor may include four dimensions of Batch (Batch), Width (Width), Height (Height), and Feature (Feature). For example, when RGB images of 4 × 5 pixels are used as training images and two images are trained at a time as a batch, the input tensor as shown in fig. 3 can be obtained. At this time, each cube in the graph may represent a value of R, G, or B of a certain pixel point of a certain training image.
It should be understood that fig. 3 is an example given for ease of illustration, where the amount of data is small, in a real training scenario, each training image may have higher pixels, e.g., 40x60, more images may be trained per batch, e.g., 512, and the images used for training may not be RGB images. In addition, although the tensors are shown in the drawings as representations of four-dimensional tensors in a three-dimensional space for ease of understanding, the above representations are not generally used to describe space.
Tensors will flow one way in a computational graph implemented as a directed acyclic graph, e.g., from the left to the right in fig. 2, and will change shape due to the operation of node operators. For example, convolved by different convolution kernels, filled with different fill strategies, reconstructed to meet API call requirements, and so on.
Shape error (ShapeError) may occur when the TensorFlow operator is called using shape-incompatible parameters (incompatible ranks or dimensions). Shape errors the shape in the computational graph that can be represented as a node inflow tensor has a shape that does not match the shape expected for the node after the node operation. Errors frequently occur in practice because developers have difficulty understanding the troublesome semantics of thousands of APIs in a deep learning framework.
For example, many TensorFlow operators (e.g., softmax _ cross _ entry _ with _ locations in FIG. 2) support the NumPy "broadcast" semantic (by copying the leading dimension of a higher rank parameter and filling any dimension of size 1 from that other parameter to the size of the parameter to be matched, which "broadcasts" an array of fractions in a relatively larger array, by copying the size of the dimension of the higher leading dimension), which typically results in erroneous results.
In the prior art, shape errors, especially parameters with partial unknown shapes or even all unknown shapes, cannot be found through pre-inspection (e.g. debugging) before actually performing neural network training.
As previously mentioned, since ordinary computers are generally unable to provide the high computational power required for CNN calculations and are equipped with sufficient memory, more and more neural network training and derivation is performed on specialized neural network platforms. In order to perform calculations on a neural network platform, a user needs to transmit data, training data, and the like, which can generate a calculation graph, to the platform. In actual neural network calculations, eigenvalues (in the form of tensors) are passed layer by layer between multiple layers of the neural network. If the shape of the transfer tensor is wrong, the computation is terminated. In the prior art, the shape mismatch cannot be detected in the pre-detection operation, but only results in the termination of the calculation in the actual calculation process, thereby causing a great waste of transmission, storage and calculation resources.
Therefore, the invention provides an improved neural network computing scheme, which acquires the constraint condition of characteristic value shape flow by constructing a shape flow graph based on the existing computation graph and finds out the shape error according to the constraint condition, so that the shape error which can only be found out before in the actual computation operation due to the operation termination can be found out through pre-detection, and unnecessary transmission, storage and computation resource consumption is avoided.
FIG. 4 shows a schematic diagram of a neural network computing system to which the present invention is applied. As shown in fig. 4, the neural network computing system 400 includes a neural network platform 410 and a client 420. The client 420 may operate on a neural network platform to obtain computational graph construction information, or may transmit the completed computational graph construction information to the platform 410 along with training data (e.g., in the task of image classification, the training data includes training images and training labels). The platform 410 may be a large neural network computing service platform (i.e., a server).
The server 410 may perform pre-checking (e.g., debugging) on the computation graph construction information uploaded by the user, find out the implied error, and perform neural network computation (e.g., neural network training, or image classification task based on the trained network) when the pre-checking is error-free.
In the prior art, the pre-check module on the server 410 cannot detect the shape error. Therefore, the computation graph construction information including the shape error is loaded into the memory of the server 410 along with the training data for computation, and the computation is terminated due to the error. In other words, since the shape error cannot be found in advance, the above-mentioned invalid data transmission, memory loading and operation are caused, which results in a huge waste of resources.
Therefore, the neural network computing platform is provided with additional tensor shape tracking modules, the modules can be used for carrying out operation based on the dynamic computation graph, and further can find out shape errors through pre-detection before the real operation of the neural network computation, so that the resource waste caused by the operation of loading the computation graph forming information containing the shape errors is avoided.
The present application may be realized as a neural network tensor shape tracking method, in the first place. Figure 5 shows a schematic flow diagram of a method of neural network tensor shape tracking in accordance with one embodiment of the present invention.
At step S510, a shape flow graph is constructed based on a computation graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computation operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges. Here, the shape flow graph may be a label added with tensor shape information on the basis of a computation graph, for example, as shown in fig. 7 below, [ batch _ size,28,281 ] may indicate that what is initially fed into the convolutional layer is a four-rank tensor, in which the input picture is, for example, a grayscale map of 28 × 28 pixels, and the number of batch _ size references is fed at once.
In step S520, constraints for tensor shape information are extracted from node operations and newly introduced operands of the shape flow graph. As can be seen from the CNN basic knowledge, input data (e.g., an input graph) propagates layer by layer along the computation graph in the form of a feature graph. These feature maps are in the form of tensors of 2 or more dimensions before being fed into the classifier. These feature map tensors flow into nodes along the shape flow graph and perform computational operations with operands introduced by the node operations (e.g., convolution operations with newly introduced convolution kernel operands in convolution nodes) or on their own as dictated by the node operations (e.g., element-count invariant reconstruction). These node operations and operands can all raise constraints. For example, the convolution operation with the convolution kernel requires the input feature map tensor to have the same rank as the convolution kernel tensor, and requires the reconstruction operation to have the tensor element number unchanged. These conditions can be used as constraints for the determination as follows.
In step S530, it is determined whether there is an error or limitation in the shape information of the tensor based on the constraint condition. Here, the shape error may be a contradiction between the constraints, for example, the rank of the input tensor behavior _ input in example 2 below needs to be equal to 4 and 3 at the same time, and the determination is erroneous because the constraint is not realizable. The shape limitation may refer to a strict limitation encountered when the user inputs the shape parameter, for example, the constraint condition extracted in example 1 below is that when the size _ size × 32 is equal to the size _ size is equal to 1, the constraint condition is satisfied only when the size _ size is 1, which limits the tensor shape size input by the user to be strictly equal to 1.
In step S540, when it is determined that there is an error or limitation in the shape information of the tensor, a report is made.
The tensor shape tracking method of the present invention makes the determination with virtually no eigenvalue flow, in other words, the method is performed in a pre-check step prior to actually performing neural network calculations (e.g., training), for example, a compilation step to generate IR may be incorporated, and is particularly applicable where there is unknown information in the shape information of the tensor. Thus, extracting constraints for tensor shape information from node operations and newly introduced operands of the shape flow graph may include: listing a plurality of constraints for the unknown information based on each node operation and operand relationship to the unknown information of the shape flow graph; and solving for unknown information based on the plurality of constraints: if the unknown information has no solution, judging that a shape error exists; if a particular solution exists for the unknown information, then a shape constraint is determined to exist.
In one embodiment, the shape flow graph is constructed by slicing backward starting from the input tensors of the computational graph intermediate representation, and collecting the operands required for node operations during the slicing backward, the operands comprising constants and scalar variables. Upon encountering a call, the return value of the call node operation may be added to the shape flow graph as a new node in a backward slice, and the backward slice may continue after the newly added node. When an execution branch is encountered, all preceding slices of the junction node with n incoming values may be copied to n; and selecting a different one of the n incoming values for each copied previous slice to continue slicing, constructing a shape flow graph for each path in the computation graph.
In one embodiment, reporting when it is determined that there is an error or limitation in the shape information of the tensor may include reporting a node where the error occurs, and for this, may include deleting the node transformation operation and the constraint introduced by the node transformation operation one by one; and locating the node operation that caused the error or limitation when the remaining constraints are all satisfied.
The above neural network tensor shape tracking method of the present invention can be implemented as a neural network tensor shape tracking module, which can be incorporated into a neural network compiler and is located after the middle intermediate information generating module. To this end, figure 6 illustrates an implementation of a neural network tensor shape tracker in accordance with one embodiment of the present invention. In the prior art, an intermediate representation generation module processes the construction information based on a computation graph from an application and generates an IR representation of the information. For example, the present invention may use Ariadane as its front end for parsing the Python program into walarir.
In order to be decoupled from the deep learning computation framework, a computation graph structure corresponding to a neural network processor needs to be constructed. For this reason, the compiler needs to abstract an Intermediate Representation (IR) that is not related to the framework, which can represent all the information of the algorithm and facilitate the optimization thereof based on the hardware platform by the subsequent compiler. The shape tracker of the present invention can be located after the intermediate representation generation module and comprises the building module as above for converting the IR representation into a shape flow graph, a constraint determination module for generating constraints and determining shape errors therefrom, and a reporting module for reporting when a shape error is determined.
For a deeper understanding of the principles of the invention, the following description will be made in conjunction with three code examples and the example of a shape flow diagram shown in fig. 7 and 8. Fig. 7 illustrates one example of a shape flow diagram constructed in accordance with the present invention. Fig. 8 illustrates another example of a shape flow diagram constructed in accordance with the present invention. The shape flow graph shown in fig. 7 can be obtained based on the computation graph shown in fig. 2. The computation graph may be described by various expressions, and the above description may constitute computation graph configuration information of the present invention. In one embodiment, the computational graph may be described in Python language under the deep learning framework TensorFlow.
Code example 1
Figure BDA0003356701010000101
An example of a computational graph described in the TensorFlow framework using Python language is given above. The TensorFlow program, usually written in Python, consists of two phases: and (4) constructing and executing. In example 1, during the build phase (lines 1-14), a computational graph is configured: each operator (e.g., tf. matmal of line 3) generates edges that connect data between some nodes and connected nodes. In the execute phase (lines 15-18), a session object is created to instantiate the graph, which is executed multiple times (sess. run in line 18) and data is entered into placeholders (e.g., in _ x and in _ y in lines 10 and 11, respectively).
Both fig. 2 and fig. 7 can be regarded as being described based on the code example 1 as above. Unlike fig. 2, in the graph depicted in fig. 7, each edge is annotated with shape information of its propagation tensor. The shape of the tensor can depend on the input: the placeholder tensor can set some dimensions (or the entire shape) to none (empty), and its shape will be instantiated by providing data to the placeholder (feed _ fact operator, e.g., line 18) when the graph is executed.
The computation graph may be executed by calling session run (), sensor eval (), or operation run (). At line 18 as described above, the graph is executed to obtain the result of the operator train step (line 14). The input data train _ image and train _ lab (line 15) are input to placeholders in _ x (line 10) and in _ y (line 11), respectively. The first dimension of the input data is configured by the boxed display input parameter batch _ size in line 15. Therefore, the tensors in _ x and in _ y are [ batch size,28,28, 1] and [ batch size, 10], respectively.
The tensor in _ x is passed as the actual argument to the function predict () on line 12 and processed by the convolution kernel conv2d (line 5). The conv2d operator is often used to extract intermediate features in complex neural networks. It requires as input a 4-dimensional input tensor, a 4-dimensional filter tensor and a step vector with 4 elements. Using the "same" fill strategy, cond2d (x, f, s, "same") will produce a shape of [ x [ 0]],x[1]/s1,x[2]/s2,f[3]The tensor of (a). Thereafter, the notation x [ i ] is used]Representing the ith dimension of the shape of the tensor x, using the symbol siRepresenting the ith element of the vector s. In the example of FIG. 7, the conv2d operator generates a shape of [ batch size,28,28, 32 ]]The tensor mp.
The reshape operator in row 6 changes the shape of the incoming tensor mp to the specified shape [ -1, 28x28 ], i.e., a two-dimensional array. Here, a special dimension size of-1 indicates that the size of the corresponding dimension needs to be dynamically calculated. If the size of the tensor (total number of items in the tensor) is the same as the size of the specified shape, the tensor can be reconstructed correctly. On line 6, after reconstruction, we have a new tensor with the reconstructed shape [ batch size 32, 28.
On line 7, the full _ connect function is called using the above reconstruction tensor as its actual parameter. Thus, the operators get _ variable (line 2) and matmul (line 3) are included in the computation graph. The matmul operator multiplies reshape ([ batch size 32, 28]) by fc _ w ([28 × 28, 128]) to obtain a new tensor fc with the shape [ batch size 32, 128 ]. Next, the tensor fc is again processed by the same function in line 8. Finally, a logit ([ batch size × 32, 10]) is generated and returned as y.
The operator softmax _ cross _ entry _ with _ locations (line 13) generates normalized probabilities from the input tensors in _ y ([ batch size, 10]) and y ([ batch size × 32, 10 ]). It supports the "broadcast" rule: the matching dimensions must be the same size or one of them is 1 (in which case the resulting tensor assumes the other size in its corresponding shape dimension). Thus, although the batch _ size is unknown, the constraint on batch _ size in the example can be solved, i.e. the operator can only succeed if the size of the first shape dimension of both tensors is the same or if one of them is 1 (i.e. batch _ size 32 ═ batch _ size ═ 1). On subsequent actual training, the application fails due to a runtime exception because the user does not know that there is an error (or tight constraint) in the tensor shape and configures the input batch size to 200 (a typical value).
Code example 2
1.user_profile_cnn=tf.reshape(tmp_user_profile_cnn,shape=[-1,num_behavior_max[behavior_cnt],n_output_behavior,1])
2.attention_layer_input=tf.matmul(behavior_input,user_profile_cnn)
……
3.tmp_attention_weights=tf.reshape(attention_weights,shape=[-1,num_behavior_max[behavior_cnt],1])
4.behavior_output=tf.matmul(tmp_attention_weights,behavior_input)
In this example, the tensor behavior _ input comes from user input, and its shape (i.e., its rank and dimension size) is completely unknown. On line 1, the tensor tmp _ user _ profile _ cnn is reconstructed as the 4-dimensional tensor user _ profile _ cnn, which is then multiplied by the behavior _ input by the operator matmul, indicating that the rank of behavior _ input is also 4, otherwise the operator will fail. On line 3, the tensor attention _ weights is reconstructed as the 3-dimensional tensor tmp _ attention _ weights, which is also multiplied by the behavior _ input. Since the input tensor behavior _ input cannot satisfy both constraints, the application will always fail on one of the matmul operators (on line 2 or 4) on subsequent operations.
Code example 3
Figure BDA0003356701010000124
In this example, the tensor labels is a two-dimensional array and the tensor pred is a one-dimensional array. All of their dimensions are unknown. When the condition loss _ type is satisfied, the bug is triggered at line 2 because the operator absolute _ difference expects the input tensors to have the same shape (i.e., the same rank and dimension size). However, if another branch is taken (when the input parameter loss _ type is "loglos"), the error is not triggered. The operator spark _ softmax _ cross _ entry _ with _ locations allows the rank of the input parameter pred to be 1 less than the rank of labels. Therefore, it does not trigger an error.
In code examples 1-3 above, there are tensors of completely unknown (example 2) or partially unknown shapes (examples 1 and 3). Tensors with unknown or partially unknown shapes are often found in real world applications. They may come from command line inputs, files, or unsupported library functions. It is difficult to write rules and infer these unknown shapes as a limited set of concrete shapes. The invention therefore proposes a shape error tracking method particularly suitable for calculating the tensor shape unknown in a graph. The method symbolizes the tensor shape by introducing sign values (i.e., shape value signs) for the tensor shape of unknown rank or unknown dimension size. In other words, the unknown shape information on the tensor can be set to one unknown number represented by a symbol. Constraints can be introduced from tensor operators, scalar variables and conditional branches. Finally, a constraint solver is applied to check the satisfiability of these constraints.
For example, for example 1 (fig. 7), the value of the input variable batch _ size is symbolized, e.g., as X. Here, the batch _ size may be regarded as unknown information (or an unknown number) in the shape flow graph of fig. 7, and may be represented by a coincidence X. The calculation graph will generate the constraint X32 ═ X ═ 1, and other constraints. The solution to the constraint is X ═ 1, which can be provided to the user as a warning. In example 2, the rank of the tensor behavior _ input is signed, expressed as
Figure BDA0003356701010000121
The two matmul operators of lines 2 and 4 will introduce their respective constraints,
Figure BDA0003356701010000122
and
Figure BDA0003356701010000123
since both cannot be satisfied at the same time, an error is found.
As shown in FIG. 6, the shape tracker of the present invention may include a construction module, a constraint determination module, and a reporting module. In actual practice, the building module first traverses the paths given by the IR and builds a shape flow graph (an abstract computational graph) for each path. The constraint decision module then formulates the shape flow graph as a constraint list, which is then solved using a constraint solver. Finally, if the constraints are not satisfied (if the user input is constrained), an error (warning) may be issued. To accurately report the line number where the error/warning occurred, the reporting module may search for the first operator that introduced the unsatisfiable constraint and report it to the user.
In particular, the build module (Builder) builds a shape flow graph for each program path. Since the control flow structure of the tensrflow program is generally simple, the number of program paths is mostly 2 and rarely reaches 8.
In particular, the shape flow graph of the present invention may be an abstract computational graph labeled with shape information. Run () can be sliced backward from the call to session (i.e., from the output tensor) in order to build the shape flow graph. In other words, as shown in fig. 6, the computation graph configuration information generated by the application may be the program code as shown in example 1, and the IR is obtained by compilation by the compiler. The IR (or the use-def chain derived therefrom) may be used by a building block to construct the shape flow graph of the present invention. Since the TensorFlow program typically propagates values directly through assignments or parameter passing, slicing can be performed along the use-def chain of the SSA (single static assignment) representation of WALA. During the backward slicing, the function call may be inline: when a function call is encountered, the return value of the called function is added to the graph (as a new node), and then slicing back from the newly added return value can continue. Finally, all the operators on which the output tensor transfer depends (i.e., the TensorFlowAPI call), the operands introduced by that operator (e.g., the tensor and the scalar (e.g., the actual parameters of the operator)) are contained in the graph.
The shape flow diagram construction of the building block will be described below with reference to example 1 and fig. 7. Run () line 18 can be sliced from, i.e., the output tensor train step. Since train step is returned from the operator minize, the operator and its operands (i.e., cross _ entry) are added to the graph. Similarly, cross _ entry is generated by the operator softmax _ cross _ entry _ with _ locations (line 13). Thus, the operator and its operands (in _ y and y) are included. Starting from y, the function call is inlined to predict (line 12) and continues to slice from its return value logit (line 9). Next, the function def _ full _ connect is inline twice in this order on line 8 and line 7. The resulting shape flow diagram is shown in fig. 7.
When there is a phi node (control flow junction in SSA) in the computation graph, the shape flow graph needs to be replicated. When a phi node with n incoming values is encountered, the graph is replicated n times, each graph selecting a different incoming value to continue slicing. Fig. 8 shows a shape flow diagram of example 2. In SSA, there is one phi node (lines 1-4) at the junction of the different branches of the if statement. Thus, two shape flow diagrams are obtained, one for each branch, namely branch 0 and branch 1 in the diagram.
The loop, although rare at the graph building stage, is handled by unrolling the loop twice.
When collecting shape information, the constant and scalar variables that propagate along the use-def chain can be directly recorded. It may be attempted to infer as much specific information as possible by applying constant propagation and computing specific shape information from the document semantics of the tensrflowapi. In some embodiments, the following two special cases may also be considered. First, the shape of the tensor can be set using the tf.sethash () function, so for each tensor its use of the tf.sethash () call to the object is checked and the shape of the tensor is updated accordingly. Second, in most cases, the values are directly propagated. However, when the tensor is initialized with a given shape, the values are passed as parameters into the constructor of the shape and stored in its corresponding field. Typically, pointer analysis is required to compute the field dependent dependencies. However, such fields of the shape object are stored only once in its constructor (during initialization). Thus, when a field load is encountered, only the unique store for the corresponding field needs to be searched.
The constraint judging module customizes the shape flow graph into a constraint list and then judges the shape error based on the constraint. Constraints can be collected from tensor operators and scalar instructions in a shape flow graph. Since the value related to the shape is less dependent on the condition in the real world tensrflow application, the branch condition may not be considered.
For ease of understanding, some symbolic representations will be defined below, from which the tensors T are representedShape, the value of vector V and the shape of scalar X. Specifically, T [ 0]],T[-1],T[-],.. may represent the dimension size of T; | T | represents the total size of T (i.e., number of elements);
Figure BDA0003356701010000131
rank (dimension number), V, representing T0,V1,...,V|V|-1An element value representing V; x represents the value of X.
Specifically, T-1]Representing the size of the last shape dimension of T. This variable is particularly useful when the rank of T is unknown, i.e.
Figure BDA0003356701010000141
Are symbolized. By default, all variables are assumed to be symbolized unless otherwise stated. If it is not
Figure BDA0003356701010000142
Is a constant value C, then a variable is introduced for each dimension size of T, which is materialized by applying the following function:
Figure BDA0003356701010000143
the function sets the rank of T to C, and T-1]Set to the last dimension size of T (T < -1 >)]==T[C-1]) And the size of T is embodied as the product of the sizes of all dimensions thereof
Figure BDA0003356701010000144
And introducing constraints for the operators according to the documenting semantics of the operators. For example, the C ═ reshape (a, B) operator reconstructs tensor a into tensor C of the same size, the shape being specified by vector B. Thus, it is possible to obtain:
Figure BDA0003356701010000145
here, the constraint | C | ═ a | indicates that tensors C and a have the same size (root)As required by reshape), the remaining constraints specify the shape of C from vector B: rank of C is defined by the size of B
Figure BDA0003356701010000146
And the dimension of C is defined by the element of B (^ s)0≤i≤|B|C[i]==Bi). Note that the size of B, i.e., B, is a constant. Thus, tensor C is materialized (Concriptize (C, | B |)). Except for one element value (e.g., -1), all other element values are constants. The same list of constraints applies for the case where the rank of tensor A is constant, i.e. when A has materialized.
The operator softmax cross entry with locations may then be checked for C locations (a, B) to support Numpy "broadcast". Before delving into the details of the troublesome Broadcast semantics, another auxiliary function Broadcast (a, B, C, i, j) is first introduced. The function represents constraints on the ith dimension of the higher rank input tensor A, the ith dimension of the output tensor C, and the matching jth dimension of another input tensor B, where j ≦ i:
Broadcast((A,B,C,i,j):((A[i]==B[j]∧C[i]==A[i])
∨(A[i]==]∧C[i]==R[j])V(B[j]==1∧C[i]==A[i]))
broadcast ((A, B, C, i) holds if 1) A [ i ] matches B [ j ], yielding the same size for the ith dimension of C, 2) A [ i ] is 1, in which case the ith dimension of C is taken from the jth dimension of B, and 3) B [ j ] is 1, in which case the ith dimension of C is taken from the dimension of A. These three cases reproduce the semantics of broadcasting a pair of matching dimensions (A [ i ] and B [ j ]).
The constrained list of C ═ locations (a, B) is given by:
Figure BDA0003356701010000147
when in use
Figure BDA0003356701010000148
Or
Figure BDA0003356701010000149
When it is a symbol, the above constraint is applied. In this case, only constraints on the size of the first and last dimensions of the input and output tensors are introduced. In that
Figure BDA00033567010100001410
In the case of (1), the constraint is applied to the first and last dimensions (Broadcast) of all three tensors A, B and C<A,B,C,0>∧Broadcast<A,B,C,-1>). In the other two cases, the constraint is applied to the last dimension of the three tensors, the output tensor C derives the size from the higher rank tensor (e.g., A [ 0]]==C[0]When in use
Figure BDA0003356701010000151
)。
When in use
Figure BDA0003356701010000152
And
Figure BDA0003356701010000153
where both are constants (i.e., A and B are specified), C can be specified as follows:
Figure BDA0003356701010000154
in that
Figure BDA0003356701010000155
In the case of (2), the output tensor C is materialized, each dimension of which is defined by the Broadcast rule. Otherwise, C is materialized in a higher rank (e.g.,
Figure BDA0003356701010000156
when in use
Figure BDA0003356701010000157
) Higher dimensionality directly copied from higher rank tensor
Figure BDA0003356701010000158
And the matched dimensions are broadcast
Figure BDA0003356701010000159
Figure BDA00033567010100001510
Similarly, appropriate constraints can be introduced for other operators, such as conv2d and matmul. Finally, the constraint of the shape flow graph is fed into a constraint solver, e.g., as described above, according to the constraint batch _ size × 32 ═ batch _ size ═ 1, resulting in example 1 that no shape error occurs only if batch _ size ═ 1; and according to the respective constraints of the two matmul operators,
Figure BDA00033567010100001511
and
Figure BDA00033567010100001512
it follows that example 2 has a shape error.
If the constraint solver of the constraint decider cannot solve a given set of constraints (if the user input is constrained by a constant value), the reporting module may issue an error (warning). To accurately report the error locations, the reporting module searches for operators that introduce discovered unsatisfiable constraints. Specifically, this can be accomplished by deleting each operator one by one (more precisely, by deleting the constraints each operator introduces), in reverse order of adding them to the underlying dataflow graph until the constraints become satisfiable. In practice, this process can be accelerated using binary search.
The practical operation of the invention is described above in connection with examples 1-3 and figures 7-8. Further, the invention can also be realized as a neural network computing method. FIG. 9 illustrates a schematic diagram of the components of a neural network computing platform, in accordance with one embodiment of the present invention. The computing platform 900 may include a tensor shape tracker implemented as part of a compiler as previously described in figure 6 or a separate tensor shape tracking device. The tracker or apparatus 910 includes: a building module 911, configured to build a shape flow graph based on a computation graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computation operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges; a constraint determining module 912, configured to extract constraint conditions for tensor shape information from node operations and newly-introduced operands of the shape flow graph, and determine whether there is an error or limitation in the shape information of the tensor based on the constraint conditions; and a reporting module 913 configured to report when it is determined that there is an error or limitation in the shape information of the tensor.
The computing platform 900 may also include a neural network training module 920 to perform neural network training calculations based on the computational graph intermediate representation with the errors or limitations removed according to the operation of the tensor shape tracking module. Although not shown in the figures, the computing platform may further include: and the training data acquisition module is used for acquiring training data and training labels, the training data is used as the initial tensor of the flow tensor in the computational graph, and the training labels are used for performing reverse parameter adjustment on the classification result output by the computational graph, wherein the training data and the training labels are loaded by the neural network training module to execute neural network training calculation in response to the tensor shape tracking module judging that no error or limitation exists. In other words, the shape tracking scheme of the present invention can perform the loading of training data and the actual training calculations only if it is confirmed that there is no shape error.
Thus, the neural network computing platform of the present invention solves for each shape flow graph constraints (introduced by shape operators) by traversing the computation graph paths and building a shape flow graph (an abstract data flow computation graph) for each path using a constrainer, and reporting errors when the constrainer cannot find a viable solution. Unlike the prior art based on static analysis, the scheme of the present invention for determining shape errors based on constraints can detect subtle errors related to shapes when the rank (number of dimensions) or dimension (size of dimension) of the shape is completely unknown.
It should be understood that the computing platform shown in FIG. 9 may be a distributed computing platform consisting of large servers and including powerful CPUs or even dedicated GPUs, and in some embodiments may be implemented by powerful local devices. Further, since the constraint-based shape error determination scheme of the present invention does not need to operate when neural network computations are actually performed, it can be implemented as part of the pre-check function of the computing platform.
FIG. 10 is a schematic diagram of a computing device that can be used to implement the shape tracking method described above according to an embodiment of the invention.
Referring to fig. 10, the computing device 1000 includes a memory 1010 and a processor 1020.
The processor 1020 may be a multi-core processor or may include multiple processors. In some embodiments, processor 1020 may include a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, the processor 1020 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, minSD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform the shape tracking methods described above.
The neural network computing platform and method according to the present invention have been described in detail above with reference to the accompanying drawings. The shape tracker of the present invention traverses program paths and constructs a shape flow graph (an abstract data flow computation graph) for each path, solving shape-related constraints (introduced by shape operators) for each shape flow graph using a constraint solver. If the constraint solver cannot find a feasible solution, an error is reported, and if the user input is constrained, a suggestion is provided as a warning.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A pair of neural network tensor shape tracking methods, comprising:
constructing a shape flow graph based on a computational graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computational operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges;
extracting constraints for shape information of a tensor from node operations and newly introduced operands of the shape flow graph;
determining whether there is an error or limitation in the shape information of the tensor based on the constraint condition; and
reporting is performed when it is determined that there is an error or limitation in the shape information of the tensor.
2. The method of claim 1, wherein unknown information exists in the added shape information of the corresponding tensor, and
extracting constraints for tensor shape information from node operations and newly introduced operands of the shape flow graph comprises:
listing a plurality of constraints for the unknown information based on node operations and operand relationships of the shape flow graph to the unknown information,
solving for unknown information based on the plurality of constraints: if the unknown information has no solution, judging that a shape error exists; if a particular solution exists for the unknown information, then a shape constraint is determined to exist.
3. The method of claim 1, wherein said constructing a shape flow graph comprises:
slicing backward from the input tensor of the intermediate representation of the computational graph to construct the shape flow graph, an
Collecting the operands required for node operations during the backward slicing, the operands comprising constants and scalar variables.
4. The method of claim 3, wherein said constructing a shape flow graph comprises:
adding a return value of the call node operation to the shape flow graph in the backward slicing as a new node, and continuing the backward slicing after the newly added node.
5. The method of claim 3, wherein said constructing a shape flow graph comprises:
copying all previous slices of the junction node having n incoming values to n; and
selecting a different one of the n incoming values for each copied prior slice to continue slicing, constructing a shape flow graph for each path in the computation graph.
6. The method of claim 1, wherein the reporting when it is determined that there is an error or limitation in the shape information of the tensor comprises:
deleting node transformation operation and constraint conditions introduced by the node transformation operation one by one; and
locating the node operation that caused the error or limitation when the remaining constraints are satisfied.
7. The tensor shape tracking method of claim 1, wherein the determining is performed without actual eigenvalue flow.
8. A neural network computing platform, comprising:
a tensor shape tracking module comprising:
a building sub-module for building a shape flow graph based on a computation graph Intermediate Representation (IR) of a neural network, the shape flow graph including nodes performing neural network computation operations and edges representing dependencies between the nodes, the edges being labeled with shape information of tensors to flow along the edges;
a constraint judging submodule for extracting a constraint condition for tensor shape information from a node operation and a newly introduced operand of the shape flow graph and judging whether there is an error or a limitation in the shape information of the tensor based on the constraint condition; and
a reporting sub-module for reporting when it is determined that there is an error or a limitation in the shape information of the tensor, a neural network training module for performing a neural network training calculation based on the intermediate representation of the computation graph from which the error or the limitation is eliminated according to the operation of the tensor shape tracking module.
9. The computing platform of claim 8, further comprising:
a training data obtaining module, configured to obtain training data and training labels, where the training data is used as an initial tensor of a flow tensor in the computation graph, and the training labels are used to perform inverse reference on a classification result output by the computation graph, where,
responsive to the tensor shape tracking module determining that there are no errors or limitations, the neural network training module loads the training data and the training labels to perform a neural network training calculation.
10. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-7.
CN202111353668.0A 2021-11-16 2021-11-16 Neural network tensor shape tracking method and computing platform Pending CN114492772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111353668.0A CN114492772A (en) 2021-11-16 2021-11-16 Neural network tensor shape tracking method and computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111353668.0A CN114492772A (en) 2021-11-16 2021-11-16 Neural network tensor shape tracking method and computing platform

Publications (1)

Publication Number Publication Date
CN114492772A true CN114492772A (en) 2022-05-13

Family

ID=81492849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111353668.0A Pending CN114492772A (en) 2021-11-16 2021-11-16 Neural network tensor shape tracking method and computing platform

Country Status (1)

Country Link
CN (1) CN114492772A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852331A (en) * 2019-10-25 2020-02-28 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
WO2024065866A1 (en) * 2022-09-27 2024-04-04 之江实验室 Intermediate representation method and apparatus for computational graph compilation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852331A (en) * 2019-10-25 2020-02-28 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN110852331B (en) * 2019-10-25 2023-09-08 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation
WO2024065866A1 (en) * 2022-09-27 2024-04-04 之江实验室 Intermediate representation method and apparatus for computational graph compilation

Similar Documents

Publication Publication Date Title
Subramanian Deep Learning with PyTorch: A practical approach to building neural network models using PyTorch
US11544630B2 (en) Automatic feature subset selection using feature ranking and scalable automatic search
Zafar et al. Hands-on convolutional neural networks with TensorFlow: Solve computer vision problems with modeling in TensorFlow and Python
CN114492772A (en) Neural network tensor shape tracking method and computing platform
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
US11500959B2 (en) Multiple output fusion for operations performed in a multi-dimensional array of processing units
WO2018129327A1 (en) Loop and library fusion
US10901715B1 (en) Lazy compilation and kernel fusion in dynamic computation graphs
Khan et al. MIOpen: An open source library for deep learning primitives
El-Amir et al. Deep learning pipeline: building a deep learning model with TensorFlow
CN115543639A (en) Optimization method for distributed execution of deep learning task and distributed system
WO2020092020A1 (en) Learning property graph representations edge-by-edge
Bhagwat et al. Applied deep learning with keras: Solve complex real-life problems with the simplicity of keras
US20200257982A1 (en) Categorical feature encoding for property graphs by vertex proximity
US20210295158A1 (en) End-to-end optimization
Raza et al. A parallel rough set based dependency calculation method for efficient feature selection
El-Amir et al. Deep learning pipeline
US20210294784A1 (en) Method and apparatus with softmax approximation
US11481604B2 (en) Apparatus and method for neural network processing
CN114746868A (en) Method and apparatus for compiling neural network model
Holdroyd TensorFlow 2.0 Quick Start Guide: Get up to speed with the newly introduced features of TensorFlow 2.0
El-Amir et al. A tour through the deep learning pipeline
Ferguson et al. A standardized PMML format for representing convolutional neural networks with application to defect detection
Suri et al. Project# 2 cnns and pneumonia detection from chest x-rays
Gad et al. TensorFlow recognition application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination