CN113269303A

CN113269303A - Data processing method and data processing device for deep learning inference framework

Info

Publication number: CN113269303A
Application number: CN202110539151.4A
Authority: CN
Inventors: 鹿馨; 王哲; 孙增增; 田永震
Original assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Current assignee: Samsung China Semiconductor Co Ltd; Samsung Electronics Co Ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-17
Also published as: KR20220156435A

Abstract

Disclosed are a data processing method and a data processing device for a deep learning inference framework, wherein the data processing method comprises the following steps: responding to a data arrangement mode that the reasoning framework does not support a reasoning model, and determining a data arrangement mode conversion strategy of input data and output data of a reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode; and converting the data arrangement mode of the input data of the inference operator and/or converting the data arrangement mode of the output data of the inference operator according to the determined conversion strategy. By utilizing the technical scheme disclosed by the invention, the reasoning performance of the deep learning reasoning framework on different Layout deep learning models can be improved.

Description

Data processing method and data processing device for deep learning inference framework

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a data processing method and a data processing apparatus for a deep learning inference framework.

Background

With the wide application of the deep science technology, a neural network model with better performance, a neural network model training framework and an inference framework suitable for different scenes are continuously emerged. Referring to fig. 1, the deployment of the neural network model can be divided into two phases: the first stage, training a neural network model based on the training framework by utilizing the powerful computing power of a server; and in the second stage, at the mobile terminal or the server terminal, performing an inference process on the neural network model (original model) trained in the first stage based on the inference framework so as to realize a corresponding task target.

The neural network model may include a plurality of operators, for example, referring to fig. 2, and the neural network model includes Input layer (Input), Output layer (Output), convolutional layer (Conv), connection layer (Contact), depth separable convolutional layer (depthwisecon), reconstruction layer (Reshape), and long and short term memory (lstm) (long short term memory). Operators can be divided into two categories: 1) in relation to the data arrangement (Layout), its implementation can be divided into two data arrangements, NHWC and NCHW, where N stands for number, C for Channel, H for height and W for width. 2) And the realization of the operator is independent of the Layout mode of the data, regardless of the Layout.

The training framework of the neural network model which is mainstream at present supports different Layout due to different software and hardware optimization schemes. For example, referring to fig. 1, NCHW arrangement represented by Caffe and PyTorch and NHWC arrangement represented by tensirflow, so that the operators of the trained neural network model have different Layout properties.

The inference framework needs to create corresponding inference operators for different operators in the original model in the process of executing inference based on the original model. At present, limited by the performance of hardware equipment and the cost of software optimization, generally, only one kind of Layout is supported by the bottom-layer implementation of an inference operator of an inference framework, so that when the Layout of an operator of a neural network model is different from the Layout of an inference operator of the inference framework, an additional data conversion operation needs to be added.

In the related art, the schemes of data conversion operation adopted in the inference stage mainly include two kinds: i) performing data transformation on each operator associated with Layout; ii) traversing the topological result of the original model, segmenting the original model into sub-blocks (blocks) with different Layout, and then inserting the operators of the Layout conversion between the blocks. However, the performance loss in the inference execution phase is still relatively large in the above two data conversion schemes.

Therefore, how to provide a scheme capable of reducing the performance loss caused by data conversion is a problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a data processing method for a deep learning inference framework and a corresponding data processing apparatus to solve at least the problems in the related art described above, and may not solve any of the problems described above.

According to an aspect of exemplary embodiments of the present disclosure, there is provided a data processing method for a deep learning inference framework, the data processing method including: responding to a data arrangement mode that the reasoning framework does not support a reasoning model, and determining a data arrangement mode conversion strategy of input data and output data of a reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode; and converting the data arrangement mode of the input data of the inference operator and/or converting the data arrangement mode of the output data of the inference operator according to the determined conversion strategy. The application provides a novel method for determining the conversion strategy of the data arrangement mode of the inference operator based on the dimension of the input and output data of the inference operator.

Optionally, the method further comprises: performing preprocessing on the input data according to the dimensionality of the input data before inputting the input data to a first-level inference operator of an inference framework, wherein the preprocessing step comprises: and in response to the fact that the dimension of the input data is a preset dimension, converting the data arrangement mode of the input data into a data arrangement mode supported by an inference frame, wherein the preset dimension is determined according to the data arrangement mode supported by the inference frame and the data arrangement mode of the inference model.

Optionally, the method further comprises: post-processing the data output from the last layer of inference operators of the inference framework according to the dimensionality of the data output from the last layer of inference operators of the inference framework, wherein the post-processing comprises: and in response to the dimensionality of the data output from the last layer of inference operator of the inference frame being a preset dimensionality, converting the data arrangement mode of the data output from the last layer of inference operator of the inference frame into a data arrangement mode supported by the inference model.

Optionally, the step of determining a data arrangement conversion strategy of the input data and the output data of the inference operator includes: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following four conditions: receiving input data of a preset dimension and correspondingly outputting output data of the preset dimension, receiving input data of a non-preset dimension and correspondingly outputting output data of the non-preset dimension, receiving input data of the non-preset dimension and correspondingly outputting output data of the preset dimension, and determining a conversion strategy of an inference operator as follows: for the condition that input data with preset dimension is received and output data with non-preset dimension is correspondingly output, converting the data arrangement mode of the input data input to the inference operator into the data arrangement mode of the inference model; and for the condition of receiving the input data with the non-preset dimension and correspondingly outputting the output data with the preset dimension, converting the data arrangement mode of the output data of the inference operator into the data arrangement mode supported by the inference framework. The conversion strategy for determining the data arrangement mode based on the dimensionality of input and output data of the inference operator is characterized in that data conversion operation only needs to be performed in an operator with changed data Rank, and the number of the operators with changed data Rank is far smaller than that of the operators related to Layout in a general neural network model. Based on the method, in the inference stage, data conversion operation in the model can be obviously reduced, so that the inference performance of the deep learning inference framework on different Layout models is improved.

Optionally, the step of determining a data arrangement conversion strategy of the input data and the output data of the inference operator further includes: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following two conditions: receiving input data with a preset dimension and correspondingly outputting output data with the preset dimension, receiving input data with a non-preset dimension and correspondingly outputting output data with the non-preset dimension, and determining a conversion strategy of an inference operator as follows: the data arrangement mode of the input data and the output data of the inference operator is not converted, and the parameters of the inference operator are adjusted according to the condition that the input data with the non-preset dimension is received and the output data with the non-preset dimension is correspondingly output.

Optionally, the step of determining a data arrangement conversion strategy of the input data and the output data of the inference operator includes: when the inference operator is executed, determining a data arrangement mode conversion strategy of input data and output data of the inference operator; or determining a data arrangement conversion strategy of input data and output data of the inference operator before executing the inference operator.

Optionally, the preset dimension is 4, where the data arrangement mode of the inference model is NHWC and the data arrangement mode supported by the inference framework is NCWH, or the data arrangement mode of the inference model is NCWH and the data arrangement mode supported by the inference framework is NHWC.

According to another aspect of exemplary embodiments of the present disclosure, there is provided a data processing apparatus for a deep learning inference framework, the data processing apparatus including: a conversion policy determination unit configured to: responding to a data arrangement mode that the reasoning framework does not support a reasoning model, and determining a data arrangement mode conversion strategy of input data and output data of a reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode; an execution unit configured to: and converting the data arrangement mode of the input data of the inference operator and/or converting the data arrangement mode of the output data of the inference operator according to the determined conversion strategy. The application provides a novel device for determining the data arrangement mode conversion strategy of an inference operator based on the conversion strategy for determining the data arrangement mode of the inference operator based on the dimension of input and output data of the inference operator.

Optionally, the data processing apparatus further includes: a pre-processing unit configured to: before input data are input into a first layer inference operator of an inference frame, responding to the fact that the dimension of the input data is a preset dimension, and converting the data arrangement mode of the input data into a data arrangement mode supported by the inference frame, wherein the preset dimension is determined according to the data arrangement mode supported by the inference frame and the data arrangement mode of an inference model.

Optionally, the data processing apparatus further includes: a post-processing unit configured to: and in response to the dimensionality of the data output from the last layer of inference operator of the inference frame being a preset dimensionality, converting the data arrangement mode of the data output from the last layer of inference operator of the inference frame into a data arrangement mode supported by the inference model.

Optionally, the conversion policy determining unit is configured to: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following four conditions: receiving input data of a preset dimension and correspondingly outputting output data of the preset dimension, receiving input data of a non-preset dimension and correspondingly outputting output data of the non-preset dimension, receiving input data of the non-preset dimension and correspondingly outputting output data of the preset dimension, and determining a conversion strategy of an inference operator as follows: for the condition that input data with preset dimension is received and output data with non-preset dimension is correspondingly output, converting the data arrangement mode of the input data input to the inference operator into the data arrangement mode of the inference model; and for the condition of receiving the input data with the non-preset dimension and correspondingly outputting the output data with the preset dimension, converting the data arrangement mode of the output data of the inference operator into the data arrangement mode supported by the inference framework.

Optionally, the conversion policy determining unit is further configured to: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following two conditions: receiving input data with a preset dimension and correspondingly outputting output data with the preset dimension, receiving input data with a non-preset dimension and correspondingly outputting output data with the non-preset dimension, and determining a conversion strategy of an inference operator as follows: the data arrangement mode of the input data and the output data of the inference operator is not converted, and the parameters of the inference operator are adjusted according to the condition that the input data with the non-preset dimension is received and the output data with the non-preset dimension is correspondingly output. The conversion strategy for determining the data arrangement mode based on the dimensionality of input and output data of the inference operator is characterized in that data conversion operation only needs to be performed in an operator with changed data Rank, and the number of the operators with changed data Rank is far smaller than that of the operators related to Layout in a general neural network model. Based on the method, in the inference stage, data conversion operation in the model can be obviously reduced, so that the inference performance of the deep learning inference framework on different Layout models is improved.

Optionally, the conversion policy determining unit is configured to: when the inference operator is executed, determining a data arrangement mode conversion strategy of input data and output data of the inference operator; or determining a data arrangement conversion strategy of input data and output data of the inference operator before executing the inference operator.

Optionally, the preset dimension is 4, the data arrangement mode of the inference model is NHWC and the data arrangement mode supported by the inference framework is NCWH, or the data arrangement mode of the inference model is NCWH and the data arrangement mode supported by the inference framework is NHWC.

According to still another aspect of exemplary embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, characterized in that when the computer program is executed by a processor, the data processing method as described above is implemented.

According to still another aspect of exemplary embodiments of the present disclosure, there is provided an electronic device including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the data processing method as described above.

Drawings

The above and other objects and features of exemplary embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:

fig. 1 is a flowchart showing an example of deep learning task deployment in the related art;

fig. 2 is a schematic diagram showing an example of a neural network model in the related art;

FIG. 3 is a schematic diagram showing a first data conversion scheme employed during the inference phase;

FIG. 4 is a schematic diagram showing a second data conversion scheme employed during the inference phase;

FIG. 5 is a flow diagram illustrating a data processing method for a deep learning inference framework according to an embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating a data processing apparatus for a deep learning inference framework in accordance with an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating an example of a structure for a deep learning inference framework, according to an embodiment of the present disclosure;

fig. 8 is a flowchart illustrating a method of performing data arrangement conversion according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating an example of a data processing method for a deep learning inference framework according to an embodiment of the present disclosure.

Detailed Description

The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, devices, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and/or systems described herein will be apparent to those skilled in the art after reviewing the disclosure of the present application. For example, the order of operations described herein is merely an example, and is not limited to those set forth herein, but may be changed as will become apparent after understanding the disclosure of the present application, except to the extent that operations must occur in a particular order. Moreover, descriptions of features known in the art may be omitted for clarity and conciseness.

The features described herein may be embodied in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein have been provided to illustrate only some of the many possible ways to implement the methods, devices, and/or systems described herein, which will be apparent after understanding the disclosure of the present application.

The terminology used herein is for the purpose of describing various examples only and is not intended to be limiting of the disclosure. The singular is also intended to include the plural unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" specify the presence of stated features, quantities, operations, elements, components, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and/or combinations thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs after understanding the present disclosure. Unless explicitly defined as such herein, terms (such as those defined in general dictionaries) should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and should not be interpreted in an idealized or overly formal sense.

Further, in the description of the examples, when it is considered that detailed description of well-known related structures or functions will cause a vague explanation of the present disclosure, such detailed description will be omitted.

In order to solve the technical problems mentioned in the background section, the inventors of the present disclosure have conducted repeated studies on the related art and found the causes of the technical problems as follows:

scheme i for data conversion operation employed in the related art):

referring to fig. 3, Layout supported by the inference operator of the inference framework is NCHW, and Layout supported by the original model (neural network model) is NHWC. The data conversion needs to be performed in each operator associated with Layout.

When the inference framework performs inference on an original model, an operator needs to be created based on structural information of the original model to perform inference computation. In the inference execution stage, the Layout of the data transferred between the created inference operators needs to be consistent with the Layout supported by the original model (i.e. neural network model), so that if the Layout of the data transferred between the inference operators is inconsistent with the Layout supported by the original model, data packing is needed for internal computation of the inference operators.

Referring to fig. 3, where the Layout-related operators include convolutional layer (Conv) and depth separable convolutional layer (DepthwiseConv), each inference operator in turn needs to perform three steps in execution: 1) converting the Layout of the input data from NHWC to NCHW; 2) perform calculations, 3) convert output data Layout from NCHW to NHWC for passing to the next level operator.

Since scheme i) performs data transformations for each operator associated with Layout. Then, based on the above analysis, when the original model contains N operators related to Layout, it is necessary to perform 2 × N data transformations in the inference stage, and the operators of this type are usually highly occupied in a neural network model, so the number of data transformations will increase linearly with the depth of the original model, and this results in a large performance loss for the execution of the original model in the inference stage.

Scheme ii for the data conversion operation employed in the related art):

referring to fig. 4, Layout supported by the inference operator of the inference framework is NCHW, while Layout supported by the original model (neural network model) is NHWC, so data conversion needs to be performed on the operator related to Layout also in the inference phase.

However, scheme ii) differs from scheme i) in that: before the inference framework performs inference on an original model, traversing the topological structure of the original model, segmenting the original model into sub-blocks (blocks) with different Layout, and then inserting operators of Layout conversion between the blocks. Specifically, referring to fig. 4, the original model is divided into Block0 and Block1, and then a conversion operator is inserted between Block0 and Block1 to convert the data Layout from NCHW to NHWC. Between the Input (Input) and Block0, a conversion operator is inserted to convert the Layout of the data from NHWC to NCHW.

Scheme ii) can obviously reduce the number of data conversion, thereby reducing the partial performance loss of the model reasoning process. However, the analysis scheme ii) can know that, before the inference framework performs inference on the original model, the topology structure of the original model is traversed, and the Layout attribute of each operator and the related operators is judged, so that complex software implementation logic is provided, and the maintainability and flexibility of the software are affected. In addition, with the increasing complexity of the neural network model topology, the time complexity and the space complexity of graph traversal increase, so that additional performance loss and power consumption burden are brought to the inference execution process.

For the above reasons, the inventor thinks that, in the process of reasoning performed by the reasoning framework for the original model, the number of data conversion should be reduced while avoiding the introduction of graph traversal, so that the performance loss in the reasoning phase can be reduced to some extent. Based on the idea, the inventor finds out through repeated research that:

the operator related to Layout in the neural network model has the characteristic that the dimension (Rank) of input and output data is 4, so that Rank information can be considered as a main judgment basis for judging the conversion of the data arrangement mode, and particularly, the supported Layout attribute (for example, NCHW or NHWC) can be realized by an inference operator of an inference framework for the data with Rank 4. The same Layout property as the original model is then maintained for data with Rank other than 4. Therefore, the data conversion operation only needs to be performed in the operators with changed data Rank, and the number of the operators with changed Rank is far smaller than that of the operators related to Layout. Based on the method, in the inference stage, data conversion operation in the model can be obviously reduced, so that the inference performance of the deep learning inference framework on different Layout models is improved. In addition, graph traversal and graph segmentation operations are not needed, and the judgment logic of data conversion is obviously simplified, so that the software development and maintenance cost is saved.

In view of the above, according to an aspect of the exemplary embodiments of the present disclosure, a data processing method for a deep learning inference framework is provided. Fig. 5 is a flow diagram illustrating a data processing method for a deep learning inference framework according to an embodiment of the present disclosure. Referring to fig. 5, the data processing method includes steps S501 to S502.

In step S501, in response to that the inference framework does not support the data arrangement of the inference model, a data arrangement conversion strategy of the input data and the output data of the inference operator is determined according to the dimension of the input data received by the inference operator, the dimension of the output data correspondingly output, and the correlation between the inference operator and the data arrangement.

In step S502, the data arrangement manner of the input data of the inference operator is converted according to the determined conversion policy, and/or the data arrangement manner of the output data of the inference operator is converted.

As an example, before inputting the input data into the first layer inference operator of the inference framework, if the inference framework does not support the data arrangement of the inference model, performing preprocessing on the input data according to the dimensionality of the input data, wherein the preprocessing step includes: and responding to the preset dimensionality of the input data, and converting the data arrangement mode of the input data into a data arrangement mode supported by an inference frame.

As an example, the preset dimension is determined according to a data arrangement mode supported by the inference framework and a data arrangement mode of the inference model.

Taking the case that the inference framework supports NCHW and the inference model supports NHWC as an example, since rank of NHWC and NCHW is 4, the dimension of the data based on the NHWC and NCHW formats is 4, and thus the preset dimension is determined to be 4, the preprocessing step includes: before inputting data into the first-layer inference operator, the data arrangement NHWC of the input data is converted into the data arrangement NCHW supported by the inference framework in response to the dimension of the input data being 4. If the dimension of the input data is not 4, the conversion of the arrangement of the input data is not performed in the preprocessing step.

As an example, the data output from the last layer of inference operators of the inference framework may be post-processed according to dimensions of the data output from the last layer of inference operators of the inference framework, wherein the post-processing comprises: and in response to the dimensionality of the data output from the last layer of inference operator of the inference frame being a preset dimensionality, converting the data arrangement mode of the data output from the last layer of inference operator of the inference frame into a data arrangement mode supported by the inference model.

Specifically, taking the case that the inference framework supports NCHW and the inference model supports NHWC as an example, the post-processing step includes: in response to the dimensionality of the data output from the last layer of the inference operator of the inference framework being 4, the data arrangement NCWH of the data output from the last layer of the inference operator of the inference framework is converted into a data arrangement NHWC supported by the inference model. In addition, if the dimension of the data output from the last layer of inference operator of the inference framework is not 4, the data arrangement of the output data does not need to be converted.

As an example, the step of determining the data arrangement conversion strategy of the input data and the output data of the inference operator comprises: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following four conditions:

1. receiving input data with a preset dimension and correspondingly outputting output data with the preset dimension;

2. receiving input data with a non-preset dimension and correspondingly outputting output data with the non-preset dimension;

3. receiving input data with a preset dimension and correspondingly outputting output data with a non-preset dimension;

4. receiving input data with a non-preset dimension and correspondingly outputting output data with a preset dimension;

the conversion strategy of the inference operator is determined as:

for

cases

1 and 2, the data arrangement of the input data and the output data is not converted.

For case 3, converting the data arrangement mode of the input data input to the inference operator into the data arrangement mode of the inference model;

for case 4, the data arrangement of the output data of the inference operator is converted into a data arrangement supported by the inference framework.

The step of determining the data arrangement mode conversion strategy of the input data and the output data of the inference operator according to the dimensionality of the input data and the output data of the inference operator further comprises the following steps: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following two conditions:

the method comprises the following steps that 1, input data with a preset dimension are received, and output data with the preset dimension are correspondingly output;

case 2, receiving input data with non-preset dimensions and correspondingly outputting output data with non-preset dimensions; the conversion strategy of the inference operator is determined as:

in case 1, there is no need to convert the arrangement of the input data and the output data, and there is no need to adjust the parameters of the inference operator.

In case 2, the data arrangement of the input data and the output data of the inference operator is not converted, and the parameters of the inference operator are adjusted.

Specifically, taking the case that the inference framework supports NCHW and the inference model supports NHWC as an example, if the implementation of the inference operator (i.e. the software implementation of the inference operator), the parameters of the inference operator, and the data supported by the inference model corresponding to the inference operator are all related to Layout and the inference operator can only receive 4-dimensional input data and correspondingly output 4-dimensional output data, then the transformation policy of such inference operator (hereinafter referred to as a class operator) is determined as follows: the data arrangement mode of the input data and the output data is not converted; if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the input data received by the inference operator and the output data correspondingly output only comprise the following two conditions: receiving 4-dimensional output data and outputting corresponding 4-dimensional output data (hereinafter referred to as B1), and receiving non-4-dimensional input data and outputting corresponding non-4-dimensional output data (hereinafter referred to as B2), determining a conversion strategy of such inference operators (hereinafter referred to as B-class operators) as: the data arrangement mode of the input data and the output data is not converted, but for the B1 case, the parameters of the inference operator need to be adjusted; if the implementation of the inference operator, the parameters of the inference operator, and the data stored in the inference model corresponding to the inference operator are not related to Layout, and the inference operator can only receive non-4-dimensional input data and correspondingly output non-4-dimensional output data, then the conversion policy of such inference operator (hereinafter referred to as class C operator) is determined as follows: the data arrangement mode of the input data and the output data is not converted, and the parameters of the inference operator are not adjusted; if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the input data received by the inference operator and the output data correspondingly output only comprise the following four conditions: receiving 4-dimensional input data and outputting 4-dimensional output data correspondingly (hereinafter referred to as D1 case), receiving non-4-dimensional input data and outputting non-4-dimensional output data correspondingly (hereinafter referred to as D2 case), receiving 4-dimensional input data and outputting non-4-dimensional output data correspondingly (hereinafter referred to as D3 case), receiving non-4-dimensional input data and outputting 4-dimensional output data correspondingly (hereinafter referred to as D4 case), determining the conversion strategy of such inference operator (hereinafter referred to as class D operator) as: for the cases of D1 and D2, the data arrangement of the input data and the output data is not converted, and the parameters of the inference operator are not adjusted; in the case of D3, the data arrangement mode NCHW of input data input to the inference operator is converted into the data arrangement mode NHWC of the inference model, and in the case of D4, the data arrangement mode NHWC of output data of the inference operator is converted into the data arrangement mode NCHW supported by the inference framework. It will be understood by those skilled in the art that this embodiment is illustrated with the inference framework supporting NCHW, and the case where the inference model supports NHWC is for illustration purposes only and does not limit the present disclosure. As an example, the data arrangement conversion strategy of the input data and the output data of the inference operator may be determined when the inference operator is executed. That is, when a certain inference operator is executed, a conversion strategy of the inference operator is determined, and then the arrangement of data is converted based on the conversion strategy. For example, the transition strategy for a layer of inference operators may be determined for each layer of inference operators executed.

As an example, the data arrangement conversion strategy of the input data and the output data of the inference operator may be determined before the inference operator is executed. That is, the conversion strategy for the inference operators may be determined in advance when no inference operator is executed, e.g., the conversion strategy for each inference operator may be determined when parsing is performed on the inference model.

It will be understood by those skilled in the art that the embodiments herein are exemplified for the case where the inference framework supports NCWH and the inference model supports NHW, for purposes of illustration only, and not to limit the disclosure.

In addition, it should be noted that the neural network model may be trained by a training framework, and the initial neural network model is subjected to supervised iteration for a preset number of times based on a training data set, so as to optimize model parameters, thereby obtaining a final neural network model.

As an example, during the initialization phase of the inference model, the dimension of the input data received by the inference operator and the dimension of the output data corresponding to the output data can be obtained, and the correlation between the inference operator and the data arrangement (i.e. the correlation between the parameters of the operator, the implementation of the operator, and the data stored by the inference model corresponding to the inference operator and Layout of the data) can be obtained.

It will be understood by those skilled in the art that operator parameters described herein refer to parameter information required in the operator calculation process, such as weights of convolution, PAD, etc.; the realization of the operator refers to a software realization mode of the operator, such as a mode that the convolution comprises GEMM, DERIRECT and the like; the data of the operator represents input data and output data processed by the operator.

For example, as an alternative embodiment, the class of the inference operator may be determined based on the dimensionality of the input data and output data of the inference operator and/or the correlation of the inference operator with Layout, and then the conversion strategy of the inference operator may be determined based on the class of the operator.

As an example, the operators may be classified into class a operators, class B operators, class C operators, class D operators according to the classification rules as described above. And then determining a conversion strategy of the data arrangement mode of the operator according to the category of the operator. Table 1 shows the Layout correlation properties and examples for different classes of operators. Referring to table 1, "input data dimension and output data dimension of operator" are data Rank dimensions shown in table 1, wherein M, N are positive integers.

TABLE 1 correlation characteristics and examples of different classes of operators

As can be understood, because the training framework is directly trained to obtain the neural network model, when the neural network model is sent to the inference framework, the model parameters are parameters of operators of each layer. If inference is to be performed, the inference framework is required to create corresponding inference operators based on the parameters of the operators of the various layers. For example, if an operator in the neural network model performs an addition, the inference framework can create a corresponding inference operator based on the parameters of the addition operator, which inference operator can specifically perform the step of the addition operation. Alternatively, referring to fig. 7, which shows a structural schematic diagram of a deep learning inference framework adopted by the present disclosure, the above process of creating inference operators of corresponding categories for each operator may be performed in an initialization stage. In addition, in the initialization stage, in addition to the above analysis model and inference operator creation, operations such as memory allocation and constant data conversion may also be performed.

Optionally, the data processing method of the present disclosure may further include: and performing model conversion on the acquired neural network models trained by different training frameworks. Specifically, referring to fig. 7, before initialization, model transformation is performed on the acquired neural network model, and the model transformation may be, for example, adjusting model parameters to improve inference performance of the inference framework. Wherein the different training frames may be, for example, Caffe, pytorreh, or tenserflow.

With continued reference to FIG. 7, the inference framework may implement inference calculations using differently configured hardware, such as NPUs, GPUs, DSPs, or CPUs, as the case may be. Specifically, the inference phase can be divided into a "preprocessing" phase, an execution phase and a "post-processing" phase. The preprocessing stage processes the input data according to the preprocessing step, the execution stage inference operator converts the input data or the output data according to respective conversion strategies, and the post-processing stage processes the data output by the last layer of operators according to the post-processing step.

As can be understood, for the case that the inference framework supports NCWH and the inference model supports NHWC, based on the description of the inventive concept of the present disclosure in the foregoing section, "the operator related to Layout must have the feature that the dimension (Rank) of the input and output data is 4 in the network model, and therefore, it may be considered that Rank information is used as a main judgment basis for judging the data conversion location, specifically, data with Rank 4 may be given to the inference operator of the inference framework to implement the supported Layout attribute (e.g., NCHW or NHWC), and then the same Layout attribute as the original model is maintained for data with Rank not 4. Therefore, data conversion operation only needs to be performed in the operators with changed data Rank, and in a general neural network model, the number of the operators with changed data Rank is far smaller than the operators related to Layout, so that data conversion operation in the model can be obviously reduced in an inference stage, and the inference performance of a deep learning inference framework on different Layout deep learning models is improved. In addition, the graph traversal and graph segmentation operations are not needed, and the judgment logic of data conversion is obviously simplified, so that the software development and maintenance cost is saved.

According to the embodiment of the disclosure, before the operator executes, the input data arrangement mode of the operator is converted into the preset data arrangement mode supported by the inference framework through a preprocessing step (corresponding to the first-layer operator) or a previous operator.

For ease of understanding, referring to table 1 and fig. 8, assume that Layout supported by the inference operator of the inference framework is the NCHW permutation, and the data permutation supported by the inference model is the NHWC permutation:

for the case of B1, the data dimension of the input data and the output data is 4, and the preprocessing only needs to adjust the parameters of the inference operator and then perform the inference calculation.

As an example, for the B1 case, the parameters that need to be adjusted may be case-dependent, e.g., the inference operator corresponding to the connection layer operator, and the relevant parameters for adjustment may be the axes in the NCHW permutation; the adjusted correlation parameter may be weight, corresponding to the inference operator of the convolution layer operator.

In the case of B2, the preprocessing does not require parameter adjustment, and the inference calculation is performed directly.

For class C inference operators, no conversion need be performed on the arrangement of the data.

For the case of D1, no conversion of the data arrangement is required; for the case of D2, no conversion of the data arrangement is required.

For the D3 case, in response to the data dimension of the received input data being 4, converting the received input data from the NCHW arrangement to the NHWC arrangement; for the case of D4, the output data computed by the inference operator is converted from the NHWC permutation to the NCHW permutation in response to the data dimension of the received input data being any positive integer other than 4.

As can be appreciated, performing a transformation on the received input data may correspond to the D3 case in the class D operator described above, that is, for the D3 case, the data received from the previous layer inference operator is transformed and then input to the current layer inference operator for computation. The output data obtained after the inference calculation is converted can correspond to the condition of D4 in the D-type operator, that is, the data calculated and output by the inference operator at the current layer is converted in a permutation mode, and then the converted data is transmitted to the next layer. Referring to the example of FIG. 8, the inference operator of the inference framework supports a Layout of Layout for NCHW, while the inference model supports a data Layout of NHWC. In the preprocessing stage, whether the Rank of input data is equal to 4 or not can be judged, and if yes, the data Layout is converted from the NHWC array to the NCHW array; if not, no conversion is carried out.

In the "execution" stage, operator 0 to operator 3 may all be inference operators, where operator 1 may be an inference operator corresponding to a class B operator, and operator 3 may be an inference operator corresponding to a class D operator.

In the "post-processing" phase, it can be determined whether the data dimension of the output data of the inference model is equal to 4, and if so, a data transformation is performed to transform the Layout of the output data from the NCHW permutation to the NHWC permutation. If not, no conversion is carried out.

The working process of the data processing method of the present disclosure is exemplarily described below with reference to fig. 9:

referring to FIG. 9, the inference operator of the inference framework supports a Layout of NCHW, while the inference model supports a Layout of NWHC.

In the initialization stage, the inference framework can determine the categories of operators and supported input and output data Rank in an original model (neural network model) based on the analysis of the original model, and then correspondingly create inference operators and allocate memories, thereby establishing the inference model. Constant data in the original model can also be converted from NHWC permutation to NCHW permutation, wherein the constant data comprises weight data of Conv and DepthwiseConv, so that partial conversion overhead is saved for the reasoning process.

In the pre-processing stage, when Rank of the input data is 4, Layout of the input data is converted from NHWC to NCHW permutation.

In the execution phase, the inference operators corresponding to class a operators (e.g., Conv, DepthwiseConv) do not need to perform data transformations, but rather are computed directly. The corresponding class B operator (e.g., Concat) only needs to adjust the parameters and then perform the inference calculation. The Reshape inference operator corresponding to the class D operator needs to perform data conversion, specifically, convert Layout of input data from NCHW to NHWC, because its input data Rank is 4 and its output data Rank is 3, that is, the ranks of input and output data are different. The inference operator corresponding to the class C operator LSTM has no relation with the realized data Layout and has 3 input and output data Rank, so that the inference calculation can be directly executed.

In the post-processing stage, since the Rank of the output data is 3, conversion is not required, and the data is directly output.

When the Layout supported by the inference operator of the inference framework is arranged as NHWC and the inference model supports NCWH, the opposite conversion logic can be adopted, namely, the operation of converting NCWH into NHWC is changed into the operation of converting NHWC into NCWH, and the operation of converting NHWC into NCWH is changed into the operation of converting NCWH into NHWC. In summary, with the data processing method of the present disclosure, in the initialization stage, the inference framework can determine the categories of operators and the supported input and output data Rank in the original model (neural network model) based on the analysis of the original model, and then correspondingly create inference operators, thereby establishing an inference model; in the preprocessing stage, giving the input data with Rank of 4 a preset Layout attribute supported by an inference framework, and keeping the Layout attribute same as that of the original model for the data stream with Rank not of 4; in the inference stage, the data conversion operation only needs to be performed in the inference operator with changed data Rank, and the data conversion operation in the inference model is obviously reduced, so that the inference performance of the deep learning inference framework on different Layout deep learning models is improved. Meanwhile, the invention does not need to perform graph traversal and graph segmentation operation, thereby obviously simplifying the judgment logic of data conversion and saving the software development and maintenance cost.

The data processing method for the deep learning inference framework according to the embodiment of the present disclosure is described above in detail, and the data processing apparatus for the deep learning inference framework according to the embodiment of the present disclosure will be described below in detail.

Referring to fig. 6, the data processing apparatus 600 includes a conversion policy determination unit 610 and an execution unit 620. Those skilled in the art will appreciate that the data processing apparatus 600 described in the present disclosure may additionally include other components.

As an example, the conversion policy determination unit 610 may be configured to: and responding to the data arrangement mode that the reasoning framework does not support the reasoning model, and determining a data arrangement mode conversion strategy of the input data and the output data of the reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode.

As an example, the execution unit 620 may be configured to: and converting the data arrangement mode of the input data of the inference operator and/or converting the data arrangement mode of the output data of the inference operator according to the determined conversion strategy.

As an example, the data processing apparatus 600 may further comprise a preprocessing unit (not shown), which may be configured to: before input data are input into a first layer inference operator of an inference frame, responding to the fact that the dimension of the input data is a preset dimension, and converting the data arrangement mode of the input data into a data arrangement mode supported by the inference frame, wherein the preset dimension is determined according to the data arrangement mode supported by the inference frame and the data arrangement mode of an inference model.

As an example, the data processing apparatus 600 may further comprise a post-processing unit (not shown) which may be configured to: and in response to the dimensionality of the data output from the last layer of inference operator of the inference frame being a preset dimensionality, converting the data arrangement mode of the data output from the last layer of inference operator of the inference frame into a data arrangement mode supported by the inference model.

As an example, the conversion policy determination unit 610 may be configured to: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following four conditions: receiving input data of a preset dimension and correspondingly outputting output data of the preset dimension, receiving input data of a non-preset dimension and correspondingly outputting output data of the non-preset dimension, receiving input data of the non-preset dimension and correspondingly outputting output data of the preset dimension, and determining a conversion strategy of an inference operator as follows: for the condition that input data with preset dimension is received and output data with non-preset dimension is correspondingly output, converting the data arrangement mode of the input data input to the inference operator into the data arrangement mode of the inference model; and for the condition of receiving the input data with the non-preset dimension and correspondingly outputting the output data with the preset dimension, converting the data arrangement mode of the output data of the inference operator into the data arrangement mode supported by the inference framework.

As an example, the conversion policy determination unit 610 may be further configured to: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following two conditions: receiving input data with a preset dimension and correspondingly outputting output data with the preset dimension, receiving input data with a non-preset dimension and correspondingly outputting output data with the non-preset dimension, and determining a conversion strategy of an inference operator as follows: the data arrangement mode of the input data and the output data of the inference operator is not converted, and the parameters of the inference operator are adjusted according to the condition that the input data with the non-preset dimension is received and the output data with the non-preset dimension is correspondingly output.

As an example, the conversion policy determination unit 610 is configured to: when the inference operator is executed, determining a data arrangement mode conversion strategy of input data and output data of the inference operator; or determining a data arrangement conversion strategy of input data and output data of the inference operator before executing the inference operator.

As an example, the preset dimension is 4, where the data arrangement mode of the inference model is NHWC and the data arrangement mode supported by the inference framework is NCWH, or the data arrangement mode of the inference model is NCWH and the data arrangement mode supported by the inference framework is NHWC.

It should be understood that the respective units/modules in the data processing apparatus according to the exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The respective units/modules may be implemented, for example, using a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), by those skilled in the art according to the processes performed by the respective units/modules as defined.

According to still another aspect of exemplary embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a data processing method according to the present disclosure.

In particular, the data processing method according to the exemplary embodiments of the present disclosure may be written as a computer program, code segments, instructions, or any combination thereof, and recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. The computer readable storage medium is any data storage device that can store data which can be read by a computer system. Examples of computer-readable storage media include: read-only memory, random access memory, read-only optical disks, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).

According to still another aspect of exemplary embodiments of the present disclosure, there is provided an electronic apparatus, wherein the electronic apparatus includes: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the data processing method of the present disclosure.

In particular, the electronic device may broadly be a tablet, a smartphone, a smartwatch, or any other electronic device having the necessary computing and/or processing capabilities. In one embodiment, the electronic device may include a processor, memory, a network interface, a communication interface, etc., connected by a system bus. The processor of the electronic device may be used to provide the necessary computing, processing and/or control capabilities. The memory of the electronic device may include a non-volatile storage medium and an internal memory. An operating system, a computer program, and the like may be stored in or on the non-volatile storage medium. The internal memory may provide an environment for the operating system and the computer programs in the non-volatile storage medium to run. The network interface and the communication interface of the electronic device can be used for connecting and communicating with an external device through a network.

In summary, with the data processing method or system of the present disclosure, in the initialization stage, the inference framework may determine the categories of operators in the original model (neural network model) and the supported input and output data Rank based on the analysis of the original model, and then correspondingly create inference operators, thereby establishing the inference model; in the preprocessing stage, giving the input data with Rank 4 a preset Layout attribute supported by an inference model, and keeping the Layout attribute same as that of the original model for the data stream with Rank not 4; in the inference stage, the data conversion operation only needs to be performed in the inference operator with changed data Rank, and the data conversion operation in the inference model is obviously reduced, so that the inference performance of the deep learning inference framework on different Layout operators of the original model is improved. Meanwhile, the invention does not need to perform graph traversal and graph segmentation operation, thereby obviously simplifying the judgment logic of data conversion and saving the software development and maintenance cost.

Although a few exemplary embodiments of the present disclosure have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the claims and their equivalents.

Claims

1. A data processing method for a deep learning inference framework, the data processing method comprising:

responding to a data arrangement mode that the reasoning framework does not support a reasoning model, and determining a data arrangement mode conversion strategy of input data and output data of a reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode;

and converting the data arrangement mode of the input data of the inference operator according to the determined conversion strategy, and/or converting the data arrangement mode of the output data of the inference operator.

2. The method of claim 1, further comprising: performing preprocessing on the input data according to the dimensionality of the input data before inputting the input data to a first-level inference operator of an inference framework,

wherein the pretreatment step comprises: and in response to the fact that the dimension of the input data is a preset dimension, converting the data arrangement mode of the input data into a data arrangement mode supported by an inference frame, wherein the preset dimension is determined according to the data arrangement mode supported by the inference frame and the data arrangement mode of the inference model.

3. The method of claim 2, further comprising:

post-processing the data output from the last layer of inference operators of the inference frame according to the dimension of the data output from the last layer of inference operators of the inference frame,

wherein the post-processing step comprises: and in response to the dimensionality of the data output from the last layer of inference operator of the inference frame being a preset dimensionality, converting the data arrangement mode of the data output from the last layer of inference operator of the inference frame into a data arrangement mode supported by the inference model.

4. The method of claim 3, wherein the step of determining a data arrangement conversion strategy for input data and output data of the inference operator comprises:

if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following four conditions: receiving input data of a preset dimension and correspondingly outputting output data of the preset dimension, receiving input data of a non-preset dimension and correspondingly outputting output data of the non-preset dimension, receiving input data of the non-preset dimension and correspondingly outputting output data of the preset dimension, and determining a conversion strategy of an inference operator as follows:

for the condition that input data with preset dimension is received and output data with non-preset dimension is correspondingly output, converting the data arrangement mode of the input data input to the inference operator into the data arrangement mode of the inference model;

and for the condition of receiving the input data with the non-preset dimension and correspondingly outputting the output data with the preset dimension and the preset dimension, converting the data arrangement mode of the output data of the inference operator into the data arrangement mode supported by the inference frame.

5. The method of claim 4, wherein the step of determining a data arrangement conversion strategy for input data and output data of the inference operator further comprises: if the parameters of the inference operator are related to the data arrangement mode, the realization of the inference operator is not related to the data arrangement mode, and the dimension of the input data received by the inference operator and the dimension of the output data correspondingly output only comprise the following two conditions: receiving input data with a preset dimension and correspondingly outputting output data with the preset dimension, receiving input data with a non-preset dimension and correspondingly outputting output data with the non-preset dimension, and determining a conversion strategy of an inference operator as follows: the data arrangement mode of the input data and the output data of the inference operator is not converted, and the parameters of the inference operator are adjusted according to the condition that the input data with the non-preset dimension is received and the output data with the non-preset dimension is correspondingly output.

6. The method of claim 1, wherein the step of determining a data arrangement conversion strategy for input data and output data of the inference operator comprises:

when the inference operator is executed, determining a data arrangement mode conversion strategy of input data and output data of the inference operator;

or determining a data arrangement conversion strategy of input data and output data of the inference operator before executing the inference operator.

7. The method according to any one of claims 2 to 6,

wherein the preset dimension is 4,

the data arrangement mode of the inference model is NHWC and the data arrangement mode supported by the inference framework is NCWH, or the data arrangement mode of the inference model is NCWH and the data arrangement mode supported by the inference framework is NHWC.

8. A data processing apparatus for a deep learning inference framework, the data processing apparatus comprising:

a conversion policy determination unit configured to: responding to a data arrangement mode that the reasoning framework does not support a reasoning model, and determining a data arrangement mode conversion strategy of input data and output data of a reasoning operator according to the dimension of the input data received by the reasoning operator, the dimension of the output data correspondingly output and the correlation between the reasoning operator and the data arrangement mode;

an execution unit configured to: and converting the data arrangement mode of the input data of the inference operator and/or converting the data arrangement mode of the output data of the inference operator according to the determined conversion strategy.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.

10. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the data processing method of any of claims 1 to 7.