WO2020253117A1 - 一种数据处理方法及装置 - Google Patents

一种数据处理方法及装置 Download PDF

Info

Publication number
WO2020253117A1
WO2020253117A1 PCT/CN2019/121358 CN2019121358W WO2020253117A1 WO 2020253117 A1 WO2020253117 A1 WO 2020253117A1 CN 2019121358 W CN2019121358 W CN 2019121358W WO 2020253117 A1 WO2020253117 A1 WO 2020253117A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
input data
functional layer
type
input
Prior art date
Application number
PCT/CN2019/121358
Other languages
English (en)
French (fr)
Inventor
吴金进
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020253117A1 publication Critical patent/WO2020253117A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of computer technology, in particular to a data processing method and device.
  • TCM tightly coupled memories
  • the embodiments of the present invention provide a data processing method and device for improving versatility.
  • the first aspect of the embodiments of the present invention provides a data processing method, including:
  • CNN convolutional neural network
  • the resource allocation method corresponding to the functional layer type is used to allocate TCM space for input data, it is applicable to all data, and does not require users to pre-allocate different data and write them into the DSP. Therefore, the versatility of resource allocation can be improved.
  • the method further includes:
  • the allocating the first TCM space for the input data using the resource allocation manner corresponding to the functional layer type includes:
  • the cutting the input data using the data cutting method corresponding to the functional layer type includes:
  • the input data processed and cut in the first TCM space using the CNN operator corresponding to the functional layer type includes:
  • the method further includes:
  • the allocating the first TCM space for the input data using the resource allocation manner corresponding to the functional layer includes:
  • the size of the TCM space allocated for the input data is determined according to the input data type. Therefore, the appropriate TCM space size can be allocated for the input data, that is, it will not waste resources and will not affect the normal processing of the data. Improve the accuracy of resource allocation.
  • the cutting the input data using the data cutting method corresponding to the functional layer includes:
  • the input data is cut using the data cutting method corresponding to the functional layer according to the size of the space required by the input data.
  • the TCM space that can be allocated for the input data is limited. Therefore, the big data can be divided into multiple small data for processing in turn by data cutting, which will not affect the data processing or Occupies too much resources, which can ensure the normal processing of data.
  • the parsing the input information to obtain input data attributes includes:
  • the method also includes:
  • the processing result of the input data is stored in the second TCM space.
  • TCM space is specially allocated for the output data, and the processing results of the input data can be stored so that the processing results can be called later.
  • the method further includes:
  • the input data processed and cut in the first TCM space using the CNN operator corresponding to the functional layer includes:
  • the CNN operator is determined according to the function layer type, input data type and output data type, rather than arbitrarily determined. Therefore, a suitable CNN operator can be used to process the input data, which can improve the accuracy and efficiency of data processing.
  • the parsing of the input information to obtain input data attributes and output data attributes includes:
  • the method also includes:
  • the input data processed and cut in the first TCM space using the CNN operator corresponding to the functional layer includes:
  • the CNN operator is determined according to the function layer type, input data type, and weight type, rather than arbitrarily determined. Therefore, you can use a suitable CNN operator to process the input data , Can improve the accuracy and efficiency of data processing.
  • the acquiring the CNN operator corresponding to the functional layer type, the input data type, and the output data type includes:
  • the type of CNN operator is determined by the type of functional layer, and the data input type of CNN operator is determined by the type of input data and output data. You can determine a suitable one through the type of function layer, input data type and output data type. The CNN operator can improve the accuracy and efficiency of data processing.
  • a second aspect of the embodiments of the present invention provides a data processing device, including a unit for executing the data processing method provided in the first aspect or any one of the first aspects.
  • a third aspect of the embodiments of the present invention provides a data processing device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions,
  • the processor is configured to invoke the program instructions to execute the first aspect or the data processing method provided in any embodiment of the first aspect.
  • a fourth aspect provides a readable storage medium, the readable storage medium stores a computer program, and the computer program includes program instructions that when executed by a processor cause the processor to execute the first aspect Or the data processing method provided by any embodiment of the first aspect.
  • the fifth aspect provides an application program, which is used to execute the data processing method provided by the first aspect or any one of the embodiments of the first aspect at runtime.
  • FIG. 1 is a schematic flowchart of a data processing method provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of another data processing method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a data processing device provided by an embodiment of the present invention.
  • Fig. 4 is a schematic structural diagram of another data processing device provided by an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention. According to different needs, some steps in the flowchart shown in Figure 1 can be divided into several steps, and several steps can also be combined into one step. Among them, the data processing method is applied to DSP (digital signal processor, digital signal processor). As shown in Figure 1, the data processing method may include the following steps.
  • DSP digital signal processor, digital signal processor
  • the input information may include the input data and the function layer type corresponding to the input data.
  • the input data is data to be processed, which can be convolution layer data, pooling layer data, fully connected layers (FC) data, or deconvolution
  • the data can also be depthwise (DW) convolution data, batch normalization (BN) data, PReLU (Parametric Rectified Linear Unit) data, or L2 normalization data.
  • L2 normalization (L2N) data can also be ReLU (Rectified Linear Unit) data, sigmoid data, or other data that needs CNN (convolutional neural networks, convolutional neural network) processing .
  • Each input data can be one layer of data, one layer of convolutional layer data, one layer of pooling layer data, one layer of fully connected layer data, or one layer of other layer data.
  • the function layer type can be convolutional layer, pooling layer, FC layer, deconvolution, DW convolution, BN, PReLU, or L2N , It can also be ReLU, Sigmoid, or other types.
  • the input data can have a one-to-one correspondence with the functional layer type, that is, one type of input data uniquely corresponds to one functional layer type, for example, the functional layer corresponding to the convolutional layer data is the convolutional layer. It can also be that multiple types of input data correspond to one type of functional layer.
  • the resource allocation method corresponding to the functional layer type can be used to allocate the first TCM space for the input data.
  • One functional layer type can correspond to one resource allocation method, and one resource allocation method can correspond to one or more functional layer types.
  • the resource allocation method may be a convolutional layer method, a pooling layer method, an FC layer method, an overall method, or other methods.
  • the data cutting method corresponding to the functional layer type can be used to cut the input data.
  • One functional layer type can correspond to one data cutting method
  • one data cutting method can correspond to one or more functional layer types.
  • the data cutting method may be a convolutional layer method, a pooling layer method, an FC layer method, an integral method, or other methods.
  • the input data can be cut according to the space required for the input data using the data cutting method corresponding to the functional layer type, that is, the size of the data cut from the input data for the first time can be equal to the space required for the input data, and it has never been cut each time.
  • the size of the cut data in the input data may be equal to the size of the space required by the input data, until the size of the uncut input data is less than or equal to the size of the space required by the input data.
  • the CNN operator corresponding to the functional layer type can be used to process the cut input data in the first TCM space.
  • the first data can be moved to the first TCM space first, and then the CNN operator corresponding to the functional layer type can be used in the first TCM space to process the first data.
  • the second data can be moved to the first TCM space, and then the CNN operator corresponding to the functional layer type can be used in the first TCM space to process the second data until all the cut input data is processed.
  • the first data and the second data are data fragments in the cut input data.
  • the resource allocation method corresponding to the functional layer type is used to allocate the TCM space for the input data, it is applicable to all data, without the need for the user to pre-allocate and write to the DSP for different data Therefore, the versatility of resource allocation can be improved.
  • FIG. 2 is a schematic flowchart of another data processing method according to an embodiment of the present invention. According to different requirements, some steps in the flowchart shown in Figure 2 can be split into several steps, and several steps can also be combined into one step. Among them, the data processing method is applied to DSP. As shown in Fig. 2, the data processing method may include the following steps.
  • the input information may include the number of input data tensors (Tensor) and input data Tensor information.
  • the number of input data Tensor can indicate how many Tensors are included in the input data.
  • the input data Tensor information may include the number of input data Tensor and the information of the input data Tensor.
  • the input data Tensor information may include the input data type, the dimension of the input data, the maximum dimension of the input data, and the input data.
  • the input data type can be the number of bytes of each input data Tensor.
  • the dimension of the input data can be the dimension of the input data, which can be one-dimensional, two-dimensional, three-dimensional, four-dimensional, or other dimensions.
  • the maximum dimension of the input data is the maximum dimension allowed for the input data.
  • the input data is the data to be processed, which can be convolutional layer data, pooling layer data, FC layer data, deconvolution data, deep convolution data, or BN
  • the data can also be PReLU data, L2N data, ReLU data, Sigmoid data, or other data that needs to be processed by CNN.
  • Each input data can be one layer of data, one layer of convolutional layer data, one layer of pooling layer data, one layer of fully connected layer data, or one layer of other layer data .
  • the input data Tensor information may also include area information of valid data.
  • the area information of the valid data can include left (left), top (top), width (width) and height (height), which can indicate the position of the valid data in the image, and determine the value on the abscissa to the left and the ordinate
  • the value of is the top point, and the area with the height along the negative direction of the ordinate and the width along the positive direction of the abscissa with this point as the starting position is this area.
  • the input information may include the type of the functional layer corresponding to the input data.
  • the function layer type can be convolutional layer, pooling layer, FC layer, deconvolution, DW convolution, BN, PReLU, or L2N , It can also be ReLU, Sigmoid, or other types.
  • the input information may also include the number of output data Tensor and output data Tensor information.
  • the number of output data Tensor may indicate how many Tensors are included in the output data.
  • the output data Tensor information may include the number of output data Tensor and the information of the output data Tensor.
  • the output data Tensor information may include the output data type, the dimension of the output data, and the maximum dimension of the output data.
  • the output data type can be the number of bytes of each output data Tensor.
  • the dimension of the output data may be the dimension of the output data.
  • the maximum dimension of the output data is the maximum number of dimensions allowed for the output data.
  • the input information may also include a weight parameter, and the weight parameter may include weight information.
  • the weight parameters can also include bias information and shift information.
  • the order is weight information, offset information, and transfer information in order.
  • the input information may also include a layer parameter array and a layer parameter size, and the layer parameter size may indicate the size of the layer parameter array.
  • the functional layer corresponding to the functional layer type can be obtained from the functional layer library.
  • the input information can be parsed to obtain input data attributes and output data attributes.
  • the analysis here can be translation, such as translating English into Chinese, or conversion, such as converting the format of the information in the input information into a format that can be processed by the functional layer, or other similar processing.
  • the input data attributes can be parsed by the number of input data Tensor and the input data Tensor information.
  • Input data attributes may include input data dimensions and input data types. When the maximum dimension of the input data is 4 dimensions and the dimension of the input data is three dimensions, the input data dimensions may include channel, height, and width. Please refer to the above description for the input data type.
  • the output data attributes can be parsed through the number of output data Tensor and output data Tensor information.
  • the output data attributes can include output data dimensions and output data types. When the maximum dimension of the output data is 4 dimensions and the dimension of the output data is three dimensions, the output data dimensions may include channel, height, and width. Please refer to the above description for the output data type.
  • the weight attribute can be parsed from the weight parameter.
  • the weight attribute may include a weight type, and the weight type is the number of bytes of each weight data Tensor in the weight data.
  • the bias attribute and shift attribute can also be analyzed.
  • the offset attribute may include the offset type, and the offset type is the number of bytes of each offset data Tensor in the offset data.
  • the transfer attribute may include the transfer type, which is the number of bytes of each transfer data Tensor in the transfer data.
  • the kernel information can be parsed through the layer parameter array and the layer parameter size.
  • the kernel information may include kernel dimensions, stride dimensions, and padding dimensions.
  • the core dimension can include height and width
  • the step length dimension can include height and width
  • the filling dimension can include height and width.
  • the first TCM space can be allocated to the input data using the resource allocation method corresponding to the functional layer according to the input data attributes, and the resource allocation method corresponding to the functional layer can be used according to the output data attributes.
  • the output data is allocated to the second TCM space.
  • the size of the space required for the input data, and the size of the output data can be determined according to the amount of output data, and then the space required for the input data in the TCM is allocated to the input data to obtain the first TCM space, and the output in the TCM
  • the space required for the data is allocated to the output data to obtain the second TCM space, that is, the space required for the input data is allocated from the free space of the TCM to the input data, and the space required for the output data is allocated from the free space of the TCM
  • the resource allocation method may be a convolutional layer method, a pooling layer method, an FC layer method, an overall method, or other methods.
  • the resource allocation method When the resource allocation method is the convolutional layer method, the resource allocation method can be divided into two stages. When the input data meets the requirements of the first stage (seg1-1), resources are allocated according to the first stage; After the failure of the first-stage resource allocation, the second-stage (seg1-2) resource allocation is started, and the channel is cut and allocated on the basis of the first-stage row cutting method.
  • the first stage (seg1-1) resource allocation can be as follows:
  • the total channel weight data amount (weight_para1) can be calculated.
  • in_hsize can be the product of the input data channel, input data width and input data type.
  • out_hsize can be the product of the output data channel, the output data width and the output data type.
  • the weight_para1 can be the product of the input data channel, the output data channel, the kernel size and the weight type, and the kernel size can be the product of the height and width of the kernel.
  • weight parameter also includes offset information and transfer information, weight_para1 also needs to add the data amount of these two kinds of information.
  • min_size1 may be the sum of the minimum data amount of input data (in_min_size1) and the minimum data amount of output data (out_min_size1).
  • min_size1 may be the sum of in_min_size1, out_min_size1, and weight_para1.
  • in_min_size1 may be the product of the core height and in_hsize, and out_min_size1 may be out_hsize.
  • min_size1 is less than or equal to the maximum resource (NN_POOL_SIZE), that is, whether it is less than or equal to the size of the free TCM space.
  • the first stage of resource calculation is used, that is, the data amount of each piece of input data and the data amount of each piece of output data are calculated.
  • the resource allocatable amount (remain_res) of input data and output data can be the difference between NN_POOL_SIZE and weight_para1.
  • the reserved amount of input data can be the product of the difference between the height of the core and the height of the step size and in_hsize, and the number of rows of each piece of output data (seg1_siz e) is equal to the difference between the remaining_res and the reserved amount of input data and then divided With deno, deno can be the product of the step height and in_hsize plus out_hsize.
  • the number of segments (seg1_num) in the first stage can be the number of rows of output data divided by seg1_size.
  • the number of rows of each piece of input data can be the product of the difference between seg1_size and one and the height of the step, plus the height of the kernel.
  • the data volume of each piece of input data can be the product of the number of rows of each piece of input data and in_hsize.
  • the data amount of each piece of output data can be the product of the number of rows of each piece of output data and out_hsize, and the weight data amount can be weight_para1.
  • the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space, and the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space, and the TCM can be allocated The space of the medium weight data amount is allocated to the weight data to obtain the third TCM space.
  • the number of segments (seg2_num) of the second stage is 1, and the number of channels (seg2_size) of each segment is the number of channels of output data.
  • the second stage (seg1-2) resource allocation can be as follows:
  • the minimum amount of input data can be in_min_size1.
  • the data amount (out_wsize) of the output data in each row of a single channel can be the product of the output data width and the output data type, and the minimum data amount (out_min_size2) of the output data can be out_wsize.
  • the weight data amount (weight_para2) under the output single channel can be the product of the input data channel, kernel size, and weight type.
  • the minimum data amount (min_size2) of the second stage can be the sum of in_min_size2, out_min_size2, and weight_para2. Afterwards, it is judged whether min_size2 is less than or equal to NN_POOL_SIZE.
  • the second stage resource calculation can be performed.
  • a warning can be output.
  • the resource allocatable amount (remain_res) of input data and output data in the second stage can be the difference between NN_POOL_SIZE and in_min_size2.
  • the number of channels (seg2_size) of each piece of output data can be the remain_res divided by deno, which can be the sum of out_wsize and weight_para2.
  • the number of segments (seg2_num) in the second stage can be the output data channel divided by seg2_size.
  • the data amount of each piece of input data can be in_min_size2.
  • the data amount of each piece of output data can be the product of seg2_size and out_wsize.
  • the amount of weight data may be the product of out_wsize and weight_para2.
  • the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space, and the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space, and the TCM can be allocated The space of the medium weight data amount is allocated to the weight data to obtain the third T CM space.
  • the number of segments (seg1_num) of the first stage is the number of rows of output data
  • the number of rows (seg1_size) of each segment is 1.
  • the resource allocation method When the resource allocation method is the pooling layer method, the resource allocation method can be divided into two stages. When the input data meets the requirements of the first stage (seg2-1), the resources are allocated according to the first stage row cut; After the first-stage resource allocation fails, the second-stage (seg2-2) resource allocation will be started, and the channel cutting will be performed on the basis of the first-stage row cutting method to allocate resources.
  • the first stage (seg2-1) resource allocation can be as follows:
  • in_plane_size can be the product of the number of rows of input data, the width of the input data, and the input data type
  • out_plane_size can be the product of the number of rows of output data
  • width of the output data can be the product of the output data type
  • weight_inch_size can be the product of the kernel size and the weight type.
  • weight_inch_size should also add the data amount of these two kinds of information.
  • min_size1 may be the sum of the minimum data amount of input data (in_min_size1), the minimum data amount of output data (out_min_size1), and the minimum data amount of weight (weight_para1).
  • in_min_size1 can be in_plane_size
  • out_min_size1 can be out_plane_size
  • weight_para1 can be weight_inch_size.
  • min_size1 is less than or equal to the maximum resource value (NN_POOL_SIZE), that is, whether it is less than or equal to the size of the free TCM space.
  • the first stage of resource calculation is used, that is, the data amount of each piece of input data and the data amount of each piece of output data are calculated.
  • the resource allocable amount of input data and output data can be NN_POOL_SIZE.
  • the number of channels (seg1_size) of each segment of output data can be NN_POOL_SIZE divided by deno, and deno can be min_size1.
  • the number of segments in the first stage can be the output data channel divided by seg1_size, and the data amount of each segment of input data can be the product of seg1_size and in_plane_size.
  • the data amount of each piece of output data can be the product of seg1_size and out_plane_size, and the weight data amount can be the product of seg1_size and weight_inch_size.
  • the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space
  • the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space
  • the TCM can be allocated
  • the space of the medium weight data amount is allocated to the weight data to obtain the third TCM space.
  • the number of segments (seg2_num) of the second stage is 1, and the number of channels (seg2_size) of each segment is the number of rows of output data.
  • the second stage (seg2-2) resource allocation can be as follows:
  • the data amount (in_wsize) of each row of input data can be the product of the width of the input data (including padding) and the input data type, and the minimum data amount of the input data (in_min_size2) can be the product of the core height and in_wsize.
  • the data amount (out_wsize) of each row of output data may be the product of the width of the output data and the output data type, and the minimum data amount (out_min_size2) of the output data may be out_wsize.
  • the weight data amount (weight_para2) may be weight_inch_size.
  • the minimum data amount (min_size2) of the second stage can be the sum of in_min_size2, out_min_size2, and weight_para2.
  • min_size2 is less than or equal to NN_POOL_SIZE.
  • the second stage resource calculation can be performed.
  • a warning can be output.
  • the resource allocatable amount (remain_res) of input data and output data in the second stage can be the difference between NN_POOL_SIZE and weight_para2 minus the reserved amount of input data.
  • the reserved amount of input data can be the difference between the height of the core and the height of the step, multiplied by in_wsize.
  • the number of channels (seg2_size) of each piece of output data can be the remain_res divided by deno, which can be the product of the step height and in_wsize plus out_wsize.
  • the number of segments in the second stage can be the number of rows of output data divided by seg2_size, and the number of rows of each segment of input data can be the difference between seg2_size and one multiplied by the height of the step, plus the height of the core.
  • the data volume of each piece of input data can be the product of the number of rows of each piece of input data and in_wsize.
  • the data amount of each piece of output data can be the product of the number of rows of each piece of output data and out_wsize.
  • the weight data amount can be weight_para2. After that, the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space, and the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space, and the TCM can be allocated The space of the medium weight data amount is allocated to the weight data amount to obtain the third TCM space.
  • the number of segments (seg1_num) of the first stage is the number of channels of output data
  • the number of channels (seg1_size) of each segment is 1.
  • the resource allocation method can be divided into two stages.
  • the resources are allocated according to the first stage row cut;
  • the second stage (seg3-2) resource allocation will be started, and the channel cutting is performed on the basis of the row cutting method in the first stage to allocate resources.
  • the first stage (seg3-1) resource allocation can be as follows:
  • in_plane_size can be the product of the number of rows of input data, the width of the input data, and the type of input data.
  • out_plane_size can be the product of the number of rows of output data, the width of output data, and the type of output data.
  • the weight_inch_size can be the product of the input data channel, the number of input data rows, the input data width, and the input data type. In the case that the weight parameter includes offset information and transfer information, weight_inch_size should also add the data amount of these two kinds of information.
  • min_size1 may be the sum of the minimum data amount of input data (in_min_size1), the minimum data amount of output data (out_min_size1), and the minimum data amount of weight (weight_para1).
  • in_min_size1 can be the product of the input data channel and in_plane_size
  • out_min_size1 can be out_plane_size
  • weight_para1 can be weight_inch_size.
  • min_size1 is less than or equal to the maximum resource (NN_POOL_SIZE), that is, whether it is less than or equal to the size of the free TCM space.
  • the first stage of resource calculation is used, that is, the data amount of each piece of input data and the data amount of each piece of output data are calculated.
  • the resource allocatable amount (remain_res) of input data and output data in the first stage can be the difference between NN_POOL_SIZE and in_min_size1.
  • the number of channels (seg1_size) of each segment of output data can be NN_POOL_SIZE divided by deno, which can be the sum of out_min_size1 and weight_para1.
  • the number of segments (seg1_num) in the first stage can be the channel of the output data divided by seg1_size.
  • the data amount of each piece of input data can be in_min_size1.
  • the data amount of each piece of output data can be the product of seg1_size and out_plane_size.
  • the amount of weight data may be the product of seg1_size and weight_inch_size.
  • the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space
  • the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space
  • the TCM can be allocated
  • the space of the medium weight data amount is allocated to the weight data to obtain the third TCM space.
  • the number of segments (seg1_num) of the first stage is the number of rows of output data
  • the number of rows (seg1_size) of each segment is 1.
  • the second stage (seg3-2) resource allocation can be as follows:
  • the minimum data amount (min_size2) of the second stage can be the sum of out_min_size1, input data type and weight type. Afterwards, it is judged whether min_size2 is less than or equal to NN_POOL_SIZE. In the case of judging that it is less than or equal to NN_POOL_SIZE, the second stage resource calculation can be performed. In the case of judging that it is larger than NN_POOL_SIZE, a warning can be output.
  • the resource allocable amount (remain_res) of the input data and output data of the second stage can be the difference between NN_POOL_SIZE and out_min_size1, and the data amount of each segment (seg2_size) can be remain_res divided by deno, deno can be one of the input data type and the weight type with.
  • the number of segments (seg2_num) in the second stage can be the ratio of in_min_size1 to the input data type (in_cn) divided by seg2_size.
  • the data amount of each piece of input data can be seg2_size times in_cn.
  • the data amount of each piece of output data can be out_min_size1, and the weight data amount can be seg2_size times the weight type (weight_cn).
  • the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space
  • the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space
  • the TCM can be allocated
  • the space of the medium weight data amount is allocated to the weight data amount to obtain the third TCM space.
  • the number of segments (seg1_num) of the first stage is the number of channels of output data
  • the number of channels (seg1_size) of each segment is 1.
  • the address in the allocated TCM space can be 4 Byte alignment
  • the address in the allocated TCM space can be aligned with 8 bytes. If both seg1 and seg2 resources fail to be allocated, it indicates that the amount of data has exceeded the resource allocation capacity, and an exception can be reported without performing functional layer operations.
  • the allocatable amount of input data and output data resource can be NN_POOL_SIZE, and the data amount of each input data or output data (seg0_size) can be remain_res except deno, deno can be input data type (in_cn) and output data type ( out_cn).
  • the number of segments can be the ratio of the minimum amount of input data (in_min_size0) to in_cn divided by seg0_size, and in_min_size0 can be in_cn.
  • the data amount of each piece of input data can be seg0_size times in_cn.
  • the data volume of each segment of output data can be seg0_size multiplied by out_cn. Then, the space of the data amount of each piece of input data in the TCM can be allocated to the input data to obtain the first TCM space, and the space of the data amount of each piece of output data in the TCM can be allocated to the output data to obtain the second TCM space.
  • the address in the allocated TCM space can be aligned with 4 bytes.
  • the address in the allocated TCM space can be aligned.
  • the addresses in the allocated TCM space are aligned to 8 bytes.
  • the space size uses the data cutting method corresponding to the function layer to cut the input data.
  • Data cutting can be divided into two levels. If the amount of data is relatively small, data can be cut according to level 1, and if the amount of data is relatively large, data can be cut according to level 2.
  • level 1 When the first TCM space uses the first stage for resource allocation, level 1 can be used for data cutting.
  • level 2 When the first TCM space uses the second stage for resource allocation, level 2 can be used for data cutting.
  • the data cutting method may be a convolutional layer method, a pooling layer method, an FC layer method, an overall cutting method, or other methods.
  • the data cutting method is the convolutional layer method
  • the cutting of level 1 and level 2 can be realized by using direct memory access (DMA) multi-channel movement.
  • DMA direct memory access
  • the data cutting method is the pooling layer method
  • Level 1 cutting can use DMA continuous movement to cut multi-channel data
  • level 2 can use DMA continuous movement to cut multiple rows of data under a single channel. Regardless of level 1 or level 2, the weight data can only be cut according to the channel.
  • Level 1 can cut the output data and weight data according to the output data channel mode, and the input data is moved as a whole without cutting.
  • Level 2 cuts the input data and weight data according to the overall cutting method under the single channel of output data.
  • the data cutting method is the overall method
  • the data can be divided and moved according to the overall data volume.
  • the size of the input data for each cut in the above four methods is the size of the first TCM space.
  • the number of cutting and moving rows needs to be a few more than the number of rows in each paragraph, and the number of extra rows is the difference between the height of the core and the height of the step.
  • the resource allocation method and data cutting method can be the convolution layer method.
  • the input data includes a core
  • the number of rows of the input data needs to be calculated.
  • the input data does not include a core, the input data and output data are equally cut.
  • the resource allocation method and data cutting method can be the pooling layer method.
  • the resource allocation method and data cutting method can be the pooling layer method.
  • the resource allocation method and data cutting method can be a fully connected layer method.
  • the resource allocation method and the data cutting method can be integrated.
  • L2 is a regularization method.
  • the CNN operator corresponding to the function layer type and the input data type can be obtained.
  • the CNN operators corresponding to the functional layer type, input data type, and output data type can be obtained. You can first determine the operator type of the CNN operator according to the function layer type, determine the data input type of the CNN operator according to the input data type and output data type, and then obtain the CNN operator corresponding to the operator type and the data input type, that is, from the function Select the CNN operator corresponding to the operator type and the data input type from the CNN included in the layer.
  • the CNN operators corresponding to the function layer type, input data type, and weight type can be obtained. You can first determine the operator type of the CNN operator according to the type of the functional layer, determine the data input type of the CNN operator according to the input data type and weight type, and then obtain the CNN operator corresponding to the operator type and the data input type, that is, from the functional layer In the included CNN, select the operator type and the CNN operator corresponding to the data input type.
  • the data cutting method corresponding to the functional layer cuts the input data, and after obtaining the CNN operator corresponding to the functional layer type and the input data type, the obtained CNN operator can be used to process the cut input data in the first TCM space.
  • the cut input data can be moved to the first TCM space first, and then the CNN operator can be used to process the input data in the first TCM space.
  • the first data can be moved to the first TCM space first, and then the CNN operator corresponding to the functional layer type can be used in the first TCM space to process the first data.
  • the second data can be moved to the first TCM space, and then the CNN operator corresponding to the functional layer type can be used in the first TCM space to process the second data until all the cut input data is processed.
  • the first data and the second data are data fragments in the cut input data.
  • the resource allocation method corresponding to the functional layer type is used to allocate TCM space for the input data, it is applicable to all data, without the need for the user to pre-allocate and write to the DSP for different data Therefore, the versatility of resource allocation can be improved.
  • FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
  • the data processing device may include:
  • the receiving unit 301 is configured to receive input information including input data and a function layer type corresponding to the input data;
  • the allocating unit 302 is configured to allocate the first TCM space for the input data using the resource allocation method corresponding to the function layer type;
  • the cutting unit 303 is used to cut the input data using the data cutting method corresponding to the function layer type;
  • the processing unit 304 is configured to use the CNN operator corresponding to the functional layer type to process the cut input data in the first TCM space.
  • the data processing apparatus may further include:
  • the obtaining unit 305 is configured to obtain the functional layer corresponding to the functional layer type from the functional layer library;
  • the allocation unit 302 is specifically configured to allocate the first TCM space for input data using a resource allocation method corresponding to the functional layer;
  • the cutting unit 303 is specifically configured to cut the input data using the data cutting method corresponding to the functional layer;
  • the processing unit 304 is specifically configured to use the CNN operator corresponding to the functional layer to process the cut input data in the first TCM space.
  • the data processing apparatus may further include:
  • the parsing unit 306 is used to parse input information to obtain input data attributes, and input data attributes and input data types;
  • the allocation unit 302 is specifically used for:
  • the cutting unit 303 is specifically configured to use the data cutting method corresponding to the functional layer to cut the input data according to the size of the space required for the input data.
  • the parsing unit 306 is specifically configured to analyze input information to obtain input data attributes and output data attributes, and the output data attributes include output data types;
  • the allocation unit 302 is also configured to use the resource allocation method corresponding to the functional layer to calculate the data volume of the output data according to the output data type, determine the size of the space required for the output data according to the data volume of the output data, and the space required for the output data in the TCM The size of the space is allocated to the output data to obtain the second TCM space;
  • the data processing device may also include:
  • the storage unit 307 is configured to store the processing result of the input data in the second TCM space.
  • the acquiring unit 305 is also used to acquire the CNN operator corresponding to the functional layer type, input data type, and output data type, or acquire the CNN operator corresponding to the functional layer type, input data type, and output data type;
  • the processing unit 304 is specifically configured to use the acquired CNN operator to process the cut input data in the first TCM space.
  • FIG. 4 is a schematic structural diagram of another data processing apparatus provided by an embodiment of the present invention.
  • the data processing apparatus may include a processor 401, a memory 402, and a bus 403.
  • the processor 401 may be a general-purpose central processing unit (CPU) or multiple CPUs, a single or multiple graphics processing units (GPU), a microprocessor, an application-specific integrated circuit (ASIC), or one Or multiple integrated circuits used to control the execution of the program of the present invention.
  • CPU central processing unit
  • GPU graphics processing units
  • ASIC application-specific integrated circuit
  • the memory 402 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory 402 may exist independently, and the bus 403 is connected to the processor 401.
  • the memory 402 may also be integrated with the processor 401.
  • the bus 403 transfers information between the aforementioned components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法及装置,该方法包括:接收包括输入数据和输入数据对应的功能层类型的输入信息(101);使用功能层类型对应的资源分配方式为输入数据分配第一紧耦合内存TCM空间(102);使用功能层类型对应的数据切割方式切割输入数据(103);使用功能层类型对应的卷积神经网络CNN算子在第一TCM空间处理切割后的输入数据(104)。该方法及装置可以提高通用性。

Description

一种数据处理方法及装置 技术领域
本发明涉及计算机技术领域,具体涉及一种数据处理方法及装置。
本申请要求于2019年6月19日提交中国专利局,申请号为201910530760.6、发明名称为“一种数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
背景技术
随着人工智能的不断普及,用于实现人工智能的算法越来越多。而这些算法对数据的处理一般都是在数字信号处理器(digital signal processor,DSP)中进行的,因此,先需要对不同的数据分配不同的紧耦合内存(tightly coupled memories,TCM)以便对数据进行处理。目前,不同的数据分配多少TCM空间一般由用户预先分配好再写入DSP。因此,针对每次处理的数据都需要用户提前分配好TCM空间并写入DSP,通用性较差。
发明概述
技术问题
问题的解决方案
技术解决方案
本发明实施例提供一种数据处理方法及装置,用于提高通用性。
本发明实施例第一方面提供一种数据处理方法,包括:
接收包括输入数据和所述输入数据对应的功能层类型的输入信息;
使用所述功能层类型对应的资源分配方式为所述输入数据分配第一TCM空间;
使用所述功能层类型对应的数据切割方式切割所述输入数据;
使用所述功能层类型对应的卷积神经网络(convolutional neural networks,CNN)算子在所述第一TCM空间处理切割后的输入数据。
由于使用功能层类型对应的资源分配方式为输入数据分配TCM空间,对所有的数据都适用,而不需要用户针对不同的数据进行预先分配并写入DSP,因此,可 以提高资源分配的通用性。
作为一种可能的实施方式,所述方法还包括:
从功能层库中获取所述功能层类型对应的功能层;
所述使用所述功能层类型对应的资源分配方式为所述输入数据分配第一TCM空间包括:
使用所述功能层对应的资源分配方式为所述输入数据分配第一TCM空间;
所述使用所述功能层类型对应的数据切割方式切割所述输入数据包括:
使用所述功能层对应的数据切割方式切割所述输入数据;
所述使用所述功能层类型对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据。
可见,在功能层库中包括多种功能层,可以根据功能层类型的不同选择不同的功能层,可以得到合适的资源分配方式、数据切割方式和CNN算子,从而可以提高数据处理效率。
作为一种可能的实施方式,所述方法还包括:
解析所述输入信息得到输入数据属性,所述输入数据属性包括输入数据类型;
所述使用所述功能层对应的资源分配方式为所述输入数据分配第一TCM空间包括:
使用所述功能层对应的资源分配方式根据所述输入数据类型,计算所述输入数据的数据量;
根据所述输入数据的数据量确定所述输入数据所需空间大小;
将TCM中所述输入数据所需空间大小的空间分配给所述输入数据,得到第一TCM空间。
可见,为输入数据分配的TCM空间的大小是根据输入数据类型确定的,因此,可以为输入数据分配到合适的TCM空间大小,即不会浪费资源,也不会影响数据的正常处理,从而可以提高资源分配的精度。
作为一种可能的实施方式,所述使用所述功能层对应的数据切割方式切割所述 输入数据包括:
根据所述输入数据所需空间大小使用所述功能层对应的所述数据切割方式切割所述输入数据。
由于输入数据较大,而能够为输入数据分配的TCM空间是有限的,因此,可以通过数据切割的方式将大数据切割为多个小数据依次进行处理,即不会影响数据的处理也不会占用太大的资源,从而可以保证数据的正常处理。
作为一种可能的实施方式,所述解析所述输入信息得到输入数据属性包括:
解析所述输入信息得到输入数据属性和输出数据属性,所述输出数据属性包括输出数据类型;
所述方法还包括:
使用所述功能层对应的资源分配方式根据所述输出数据类型,计算所述输出数据的数据量;
根据所述输出数据的数据量确定所述输出数据所需空间大小;
将所述TCM中所述输出数据所需空间大小的空间分配给所述输出数据,得到第二TCM空间;
将所述输入数据的处理结果存储至所述第二TCM空间。
可见,为输出数据专门分配了TCM空间,可以存放输入数据的处理结果,以便后续可以调用处理结果。
作为一种可能的实施方式,所述方法还包括:
获取所述功能层类型、所述输入数据类型和所述输出数据类型对应的CNN算子;
所述使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
使用获取的CNN算子在所述第一TCM空间处理切割后的输入数据。
可见,CNN算子是根据功能层类型、输入数据类型和输出数据类型确定的,而不是随意确定的,因此,可以使用合适的CNN算子对输入数据进行处理,可以提高数据处理精度和效率。
作为一种可能的实施方式,所述解析所述输入信息得到输入数据属性和输出数 据属性包括:
解析所述输入信息得到输入数据属性、输出数据属性和权重属性,所述权重属性包括权重类型;
所述方法还包括:
获取所述功能层类型、所述输入数据类型和所述权重类型对应的CNN算子;
所述使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
使用获取的CNN算子在所述第一TCM空间处理切割后的输入数据。
可见,在输入信息包括权重信息的情况下,CNN算子是根据功能层类型、输入数据类型和权重类型确定的,而不是随意确定的,因此,可以使用合适的CNN算子对输入数据进行处理,可以提高数据处理精度和效率。
作为一种可能的实施方式,所述获取所述功能层类型、所述输入数据类型和所述输出数据类型对应的CNN算子包括:
根据所述功能层类型确定CNN算子的算子类型;
根据所述输入数据类型和所述输出数据类型确定CNN算子的数据输入类型;
获取所述算子类型和所述数据输入类型对应的CNN算子。
可见,CNN算子的类型是由功能层类型确定的,CNN算子的数据输入类型是由输入数据类型和输出数据类型确定的,可以通过功能层类型、输入数据类型和输出数据类型确定一个合适的CNN算子,可以提高数据处理精度和效率。
本发明实施例第二方面提供一种数据处理装置,包括用于执行第一方面或第一方面的任一实施例提供的数据处理方法的单元。
本发明实施例第三方面提供一种数据处理装置,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器用于调用所述程序指令执行第一方面或第一方面的任一实施例提供的数据处理方法。
第四方面提供了一种可读存储介质,所述可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行第一方面或第一方面的任一实施例提供的数据处理方法。
第五方面提供了一种应用程序,该应用程序用于在运行时执行第一方面或第一方面的任一实施例提供的数据处理方法。
发明的有益效果
对附图的简要说明
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。
图1是本发明实施例提供的一种数据处理方法的流程示意图;
图2是本发明实施例提供的另一种数据处理方法的流程示意图;
图3是本发明实施例提供的一种数据处理装置的结构示意图;
图4是本发明实施例提供的另一种数据处理装置的结构示意图。
发明实施例
本发明的实施方式
请参阅图1,图1是本发明实施例提供的一种数据处理方法的流程示意图。根据不同的需求,图1所示的流程图中的某些步骤可以拆分为几个步骤,几个步骤也可以合为一个步骤。其中,该数据处理方法应用于DSP(digital signal processor,数字信号处理器)。如图1所示,该数据处理方法可以包括以下步骤。
101、接收输入信息。
输入信息可以包括输入数据和输入数据对应的功能层类型。输入数据为待处理的数据,可以为卷积(convolution)层数据,也可以为池化(pooling)层数据,还可以为全连接层(fully connected layers,FC)数据,还可以为解卷积数据,还可以为深度(depthwise,DW)卷积数据,还可以为批处理(batch normalization,BN)数据,还可以为PReLU(Parametric Rectified Linear Unit,参数整流线性单元)数据,还可以为L2归一化(L2 normalization,L2N)数据,还可以为ReLU(Rectified Linear Unit,整流线性单元)数据,还可以为Sigmoid数据,还可以为其它需要CNN(convolutional neural networks,卷积神经网络)处理的数据。每次的输入数据可以为一层数据,可以为一层卷积层数据,也可 以为一层池化层数据,还可以为一层全连接层数据,还可以为一层其它层数据。
功能层类型可以为卷积层,也可以为池化层,还可以为FC层,还可以为解卷积,还可以为DW卷积,还可以为BN,还可以为PReLU,还可以为L2N,还可以为ReLU,还可以为Sigmoid,还可以为其它类型。输入数据可以与功能层类型一一对应,即一种类型的输入数据唯一对应一种功能层类型,如卷积层数据对应的功能层为卷积层。也可以是多种类型的输入数据对应一种功能层类型。
102、使用功能层类型对应的资源分配方式为输入数据分配第一TCM(tightly coupled memories,紧耦合内存)空间。
由于不同的功能层类型可以对应不同的资源分配方式,接收到输入信息之后,可以使用功能层类型对应的资源分配方式为输入数据分配第一TCM空间。一个功能层类型可以对应一种资源分配方式,一种资源分配方式可以对应一种或多种功能层类型。资源分配方式可以为卷积层方式,也可以为池化层方式,还可以为FC层方式,还可以为整体方式,还可以为其它方式。
103、使用功能层类型对应的数据切割方式切割输入数据。
由于不同的功能层类型可以对应不同的数据切割方式,使用功能层类型对应的资源分配方式为输入数据分配好第一TCM空间之后,可以使用功能层类型对应的数据切割方式切割输入数据。一个功能层类型可以对应一种数据切割方式,一种数据切割方式可以对应一种或多种功能层类型。数据切割方式可以为卷积层方式,也可以为池化层方式,还可以为FC层方式,还可以为整体方式,还可以为其它方式。可以根据输入数据所需空间大小使用功能层类型对应的数据切割方式切割输入数据,即第一次从输入数据中切割的数据的大小可以等于输入数据所需空间大小,后续每次从未被切割的输入数据中切割的数据的大小可以均等于输入数据所需空间大小,直到未被切割的输入数据的大小小于或等于输入数据所需空间大小。
104、使用功能层类型对应的CNN算子在第一TCM空间处理切割后的输入数据。
由于不同的功能层类型可以对应不同的CNN算子,使用功能层类型对应的数据 切割方式切割输入数据之后,可以使用功能层类型对应的CNN算子在第一TCM空间处理切割后的输入数据。可以先将第一数据搬移到第一TCM空间,之后可以在第一TCM空间使用功能层类型对应的CNN算子处理第一数据。在第一数据处理完之后,可以将第二数据搬移到第一TCM空间,之后可以在第一TCM空间使用功能层类型对应的CNN算子处理第二数据,直到处理完所有切割的输入数据,第一数据和第二数据是切割后的输入数据中的数据片段。
在图1所描述的数据处理方法中,由于使用功能层类型对应的资源分配方式为输入数据分配TCM空间,对所有的数据都适用,而不需要用户针对不同的数据进行预先分配并写入DSP,因此,可以提高资源分配的通用性。
请参阅图2,图2是本发明实施例提供的另一种数据处理方法的流程示意图。根据不同的需求,图2所示的流程图中的某些步骤可以拆分为几个步骤,几个步骤也可以合为一个步骤。其中,该数据处理方法应用于DSP。如图2所示,该数据处理方法可以包括以下步骤。
201、接收输入信息。
输入信息可以包括输入数据张量(Tensor)数量和输入数据Tensor信息。输入数据Tensor数量可以指示输入数据包括几个Tensor。输入数据Tensor信息可以包括输入数据Tensor数量个输入数据Tensor的信息。输入数据Tensor信息可以包括输入数据类型、输入数据的维度、输入数据的最大维度和输入数据。输入数据类型可以为每个输入数据Tensor的字节数。输入数据的维度可以为输入数据的维度,可以是一维,也可以是二维,还可以是三维,还可以是四维,还可以为其它维。输入数据的最大维度为输入数据被允许的最大维度。输入数据为待处理的数据,可以为卷积层数据,也可以为池化层数据,还可以为FC层数据,还可以为解卷积数据,还可以为深度卷积数据,还可以为BN数据,还可以为PReLU数据,还可以为L2N数据,还可以为ReLU数据,还可以为Sigmoid数据,还可以为其它需要CNN处理的数据。每次的输入数据可以为一层的数据,可以为一层卷积层数据,也可以为一层池化层数据,还可以为一层全连接层的数据,还可以为一层其它层数据。在输入数据为图像数据的情况下,输入数据Tensor信息还可以包括有效数据的区域信息。有效数据的区域信息可以包括左(left)、顶(t op)、宽(width)和高(height),可以指示有效数据在图像中的位置,确定横坐标上的值为左以及纵坐标上的值为顶的点,以这个点为起始位置沿纵坐标负方向所在高以及沿横坐标正方向所在宽的区域为这个区域。
输入信息可以包括输入数据对应的功能层类型。功能层类型可以为卷积层,也可以为池化层,还可以为FC层,还可以为解卷积,还可以为DW卷积,还可以为BN,还可以为PReLU,还可以为L2N,还可以为ReLU,还可以为Sigmoid,还可以为其它类型。
输入信息还可以包括输出数据Tensor数量和输出数据Tensor信息。输出数据Tensor数量可以指示输出数据包括几个Tensor。输出数据Tensor信息可以包括输出数据Tensor数量个输出数据Tensor的信息。输出数据Tensor信息可以包括输出数据类型、输出数据的维度和输出数据的最大维度。输出数据类型可以为每个输出数据Tensor的字节数。输出数据的维度可以为输出数据的维度。输出数据的最大维度为输出数据被允许的最大维度数。
输入信息还可以包括权重(weight)参数,权重参数可以包括权重信息。在定点模型下,权重参数还可以包括偏置(bias)信息和转移(shift)信息。在权重参数包括权重信息、偏置信息和转移信息的情况下,排序依次为权重信息、偏置信息和转移信息。
输入信息还可以包括层参数数组和层参数大小,层参数大小可以指示层参数数组的大小。
202、从功能层库中获取功能层类型对应的功能层。
由于不同的功能层对应的资源分配方式、数据切割方式和/或CNN算子可能不同,因此,接收到输入信息之后,可以从功能层库中获取功能层类型对应的功能层。
203、解析输入信息得到输入数据属性和输出数据属性。
由于针对同一输入信息不同的功能层解析出来的结果可能不同,因此,从功能层库中获取功能层类型对应的功能层之后,可以先解析输入信息得到输入数据属性和输出数据属性。此处的解析,可以为翻译,如将英文翻译为中文,也可以为转换,如将输入信息中的信息的格式转换为功能层可以处理的信息的格式 ,还可以为其它类似的处理。
可以通过输入数据Tensor数量和和输入数据Tensor信息解析出输入数据属性。输入数据属性可以包括输入数据维度和输入数据类型。在输入数据的最大维度为4维,输入数据的维度为三维的情况下,输入数据维度可以包括通道、高和宽。输入数据类型请参考上述描述。
可以通过输出数据Tensor数量和输出数据Tensor信息解析出输出数据属性,输出数据属性可以包括输出数据维度和输出数据类型。在输出数据的最大维度为4维,输出数据的维度为三维的情况下,输出数据维度可以包括通道、高和宽。输出数据类型请参考上述描述。
可以通过权重参数中解析出权重属性。权重属性可以包括权重类型,权重类型为权重数据中每个权重数据Tensor的字节数。在定点模型下,还可以解析出偏置(bias)属性和转移(shift)属性。偏置属性可以包括偏置类型,偏置类型为偏置数据中每个偏置数据Tensor的字节数。转移属性可以包括转移类型,转移类型为转移数据中每个转移数据Tensor的字节数。
可选地,可以通过层参数数组和层参数大小解析出核(Kernel)信息。核信息可以包括核维度、步长(Stride)维度和填充(Padding)维度。核维度可以包括高和宽,步长维度可以包括高和宽,填充维度可以包括高和宽。
204、根据输入数据属性使用功能层对应的资源分配方式为输入数据分配第一TCM空间,以及根据输出数据属性使用功能层对应的资源分配方式为输出数据分配第二TCM空间。
解析出输入数据得到输入数据属性和输出数据属性之后,可以根据输入数据属性使用功能层对应的资源分配方式为输入数据分配第一TCM空间,以及根据输出数据属性使用功能层对应的资源分配方式为输出数据分配第二TCM空间。可以先使用功能层对应的资源分配方式根据输入数据类型计算输入数据的数据量,以及使用功能层对应的资源分配方式根据输出数据类型计算输出数据的数据量,之后可以根据输入数据的数据量确定输入数据所需空间大小,以及可以根据输出数据的数据量确定输出数据所需空间大小,之后将TCM中输入数据所需空间大小的空间分配给输入数据得到第一TCM空间,以及将TCM中输出数据所 需空间大小的空间分配给输出数据得到第二TCM空间,即从TCM的空闲空间中分配输入数据所需空间大小的空间给输入数据,以及从TCM的空闲空间中分配输出数据所需空间大小的空间给输出数据。资源分配方式可以为卷积层方式,也可以为池化层方式,还可以为FC层方式,还可以为整体方式,还可以为其它方式。
在资源分配方式为卷积层方式的情况下,该资源分配方式可以分为两阶段,在输入数据满足第一阶段(seg1-1)要求的情况下,按照第一阶段行分配资源;在第一阶段资源分配失败后会启动第二阶段(seg1-2)资源分配,在第一阶段行切割方式的基础上进行通道切割分配资源。第一阶段(seg1-1)资源分配可以如下:
可以先计算每行全通道下输入数据的数据量(in_hsize,带填充)和输出数据的数据量(out_hsize)。在输入信息包括权重参数的情况下,还需要计算全通道权重数据量(weight_para1)。in_hsize可以为输入数据的通道、输入数据的宽和输入数据类型的乘积。out_hsize可以为输出数据的通道、输出数据的宽与输出数据类型的乘积。weight_para1可以为输入数据的通道、输出数据的通道、核尺寸和权重类型的乘积,核尺寸可以为核的高与宽的乘积。在权重参数还包括偏置信息和转移信息的情况下,weight_para1还要加上这两种信息的数据量,这两种数据量计算公式分别为输出数据的通道与各自类型(即偏置类型或转移类型)的乘积。之后计算最小数据量(min_size1)。min_size1可以为输入数据的最小数据量(in_min_size1)和输出数据的最小数据量(out_min_size1)之和。在输入信息包括权重参数的情况下,min_size1可以为in_min_size1、out_min_size1和weight_para1之和。in_min_size1可以为核的高与in_hsize的乘积,out_min_size1可以为out_hsize。之后可以判断min_size1是否小于或等于资源最大值(NN_POOL_SIZE),即是否小于或等于空闲的TCM空间大小。在判断出min_size1小于或等于NN_POOL_SIZE的情况下,使用第一阶段资源计算,即计算每段输入数据的数据量和每段输出数据的数据量。输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE与weight_para1的差值。输入数据的预留量可以为核的高与步长的高的差值再与in_hsize的乘积,每段输出数据的行数(seg1_siz e)等于remain_res与输入数据的预留量的差值再除以deno,deno可以为步长的高与in_hsize的乘积再加上out_hsize。第一阶段的分段数(seg1_num)可以为输出数据的行数除以seg1_size。每段输入数据的行数可以为seg1_size与一的差值再与步长的高的乘积,再加核的高。每段输入数据的数据量可以为每段输入数据的行数与in_hsize的乘积。每段输出数据的数据量可以为每段输出数据的行数与out_hsize的乘积,权重数据量可以是weight_para1。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据得到第三TCM空间。此时第二阶段的分段数(seg2_num)是1,每段的通道数(seg2_size)为输出数据的通道数。
在判断出min_size1大于NN_POOL_SIZE的情况下,进入第二阶段资源分配。第二阶段(seg1-2)资源分配可以如下:
输入数据的最小数据量(in_min_size2)可以为in_min_size1。每行单通道下输出数据的数据量(out_wsize)可以为输出数据的宽与输出数据类型的乘积,输出数据的最小数据量(out_min_size2)可以为out_wsize。输出单通道下的权重数据量(weight_para2)可以为输入数据的通道、核尺寸以及权重类型的乘积。第二阶段的最小数据量(min_size2)可以为in_min_size2、out_min_size2和weight_para2之和。之后判断min_size2是否小于或等于NN_POOL_SIZE,在判断出小于或等于NN_POOL_SIZE的情况下,可以进行第二阶段资源计算。在判断出大于NN_POOL_SIZE的情况下,可以输出警告。第二阶段的输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE与in_min_size2的差值。每段输出数据的通道数(seg2_size)可以为remain_res除以deno,deno可以为out_wsize与weight_para2之和。第二阶段的分段数(seg2_num)可以为输出数据的通道除以seg2_size。每段输入数据的数据量可以为in_min_size2。每段输出数据的数据量可以为seg2_size与out_wsize的乘积。权重数据量可以为out_wsize与weight_para2的乘积。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据得到第三T CM空间。此时第一阶段的段数(seg1_num)是输出数据的行数,每段的行数(seg1_size)为1。
在资源分配方式为池化层方式的情况下,该资源分配方式可以分为两阶段,在输入数据满足第一阶段(seg2-1)要求的情况下,按照第一阶段行切割分配资源;当第一阶段资源分配失败后会启动第二阶段(seg2-2)资源分配,在第一阶段行切割方式的基础上进行通道切割分配资源。第一阶段(seg2-1)资源分配可以如下:
可以先计算输入数据的每通道数据量(in_plane_size,带填充)、输出数据的每通道数据量(out_plane_size)以及权重每通道数据量(weight_inch_size)。in_plane_size可以为输入数据的行数、输入数据的宽以及输入数据类型的乘积,out_plane_size可以为输出数据的行数、输出数据的宽以及输出数据类型的乘积,weight_inch_size可以为核尺寸和权重类型的乘积。在权重参数包括偏置信息和转移信息的情况下,weight_inch_size还要加上这两种信息的数据量,这两种数据量计算公式分别为各自类型(即偏置类型或转移类型,也即每个偏置数据或转移数据的字节数)的乘积。之后计算最小数据量(min_size1)。min_size1可以为输入数据的最小数据量(in_min_size1)、输出数据的最小数据量(out_min_size1)和权重的最小数据量(weight_para1)之和。in_min_size1可以为in_plane_size,out_min_size1可以为out_plane_size,weight_para1可以为weight_inch_size。之后可以判断min_size1是否小于或等于资源最大值(NN_POOL_SIZE),即是否小于或等于空闲的TCM空间大小。在判断出min_size1小于或等于NN_POOL_SIZE的情况下,使用第一阶段资源计算,即计算每段输入数据的数据量和每段输出数据的数据量。输入数据和输出数据的资源可分配量可以为NN_POOL_SIZE。每段输出数据的通道数(seg1_size)可以为NN_POOL_SIZE除以deno,deno可以为min_size1。第一阶段的分段数(seg1_num)可以为输出数据的通道除以seg1_size,每段输入数据的数据量可以为seg1_size与in_plane_size的乘积。每段输出数据的数据量可以为seg1_size与out_plane_size的乘积,权重数据量可以为seg1_size与weight_inch_size的乘积。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空 间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据得到第三TCM空间。此时第二阶段的分段数(seg2_num)是1,每段的通道数(seg2_size)为输出数据的行数。
在判断出min_size1大于NN_POOL_SIZE的情况下,进入第二阶段资源分配。第二阶段(seg2-2)资源分配可以如下:
每行输入数据的数据量(in_wsize)可以为输入数据的宽(包含填充)与输入数据类型的乘积,输入数据的最小数据量(in_min_size2)可以为核的高与in_wsize的乘积。每行输出数据的数据量(out_wsize)可以为输出数据的宽与输出数据类型的乘积,输出数据的最小数据量(out_min_size2)可以为out_wsize。权重数据量(weight_para2)可以为weight_inch_size。第二阶段的最小数据量(min_size2)可以为in_min_size2、out_min_size2和weight_para2之和。之后判断min_size2是否小于或等于NN_POOL_SIZE,在判断出小于或等于NN_POOL_SIZE的情况下,可以进行第二阶段资源计算。在判断出大于NN_POOL_SIZE的情况下,可以输出警告。第二阶段的输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE与weight_para2的差值再减去输入数据的预留量。输入数据的预留量可以为核的高与步长的高的差值再乘以in_wsize。每段输出数据的通道数(seg2_size)可以为remain_res除以deno,deno可以为步长的高与in_wsize的乘积再加out_wsize。第二阶段的分段数(seg2_num)可以为输出数据的行数除以seg2_size,每段输入数据的行数可以为seg2_size与一的差值乘以步长的高,再加核的高。每段输入数据的数据量可以为每段输入数据的行数与in_wsize的乘积。每段输出数据的数据量可以为每段输出数据的行数与out_wsize的乘积。权重数据量可以为weight_para2。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据量得到第三TCM空间。此时第一阶段的段数(seg1_num)是输出数据的通道数,每段的通道数(seg1_size)为1。
在资源分配方式为全连接层方式的情况下,该资源分配方式可以分为两阶段,在输入数据满足第一阶段(seg3-1)要求的的情况下,按照第一阶段行切割分配 资源;在第一阶段资源分配失败后会启动第二阶段(seg3-2)资源分配,在第一阶段行切割方式基础上进行通道切割分配资源。第一阶段(seg3-1)资源分配可以如下:
可以先计算输入数据的每通道数据量(in_plane_size,带填充的)、输出数据的每通道数据量(out_plane_size)以及权重每通道数据量(weight_inch_size)。in_plane_size可以为输入数据的行数、输入数据的宽以及输入数据类型的乘积。out_plane_size可以为输出数据的行数、输出数据的宽以及输出数据类型的乘积。weight_inch_size可以为输入数据的通道、输入数据的行数、输入数据的宽以及输入数据类型的乘积。在权重参数包括偏置信息和转移信息的情况下,weight_inch_size还要加上这两种信息的数据量,这两种数据量计算公式分别为各自数据类型(即偏置类型或转移类型,也即每个偏置数据或转移数据中每个数据的字节数)的乘积。之后计算第一阶段的最小数据量(min_size1)。min_size1可以为输入数据的最小数据量(in_min_size1)、输出数据的最小数据量(out_min_size1)和权重的最小数据量(weight_para1)之和。in_min_size1可以为输入数据的通道与in_plane_size的乘积,out_min_size1可以为out_plane_size,weight_para1可以为weight_inch_size。之后可以判断min_size1是否小于或等于资源最大值(NN_POOL_SIZE),即是否小于或等于空闲的TCM空间大小。在判断出min_size1小于或等于NN_POOL_SIZE的情况下,使用第一阶段资源计算,即计算每段输入数据的数据量和每段输出数据的数据量。第一阶段的输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE与in_min_size1的差值。每段输出数据的通道数(seg1_size)可以为NN_POOL_SIZE除deno,deno可以为out_min_size1和weight_para1之和。第一阶段的分段数(seg1_num)可以为输出数据的通道除seg1_size。每段输入数据的数据量可以为in_min_size1。每段输出数据的数据量可以为seg1_size与out_plane_size的乘积。权重数据量可以为seg1_size与weight_inch_size的乘积。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据得到第三TCM空间。此时第一阶段的分段数(seg1_num)是输出数 据的行数,每段的行数(seg1_size)为1。
在判断出min_size1大于NN_POOL_SIZE的情况下,进入第二阶段资源分配。第二阶段(seg3-2)资源分配可以如下:
第二阶段的最小数据量(min_size2)可以为out_min_size1、输入数据类型与权重类型之和。之后判断min_size2是否小于或等于NN_POOL_SIZE,在判断出小于或等于NN_POOL_SIZE的情况下,可以进行第二阶段资源计算。在判断出大于NN_POOL_SIZE的情况下,可以输出警告。第二阶段的输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE与out_min_size1的差值,每段的数据量(seg2_size)可以为remain_res除deno,deno可以为输入数据类型与权重类型之和。第二阶段的分段数(seg2_num)可以为in_min_size1与输入数据类型(in_cn)的比值再除seg2_size。每段输入数据的数据量可以为seg2_size乘in_cn。每段输出数据的数据量可以为out_min_size1,权重数据量可以为seg2_size乘权重类型(weight_cn)。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间,可以将TCM中权重数据量的空间分配给权重数据量得到第三TCM空间。此时第一阶段的段数(seg1_num)是输出数据的通道数,每段的通道数(seg1_size)为1。
在上述三种资源分配方式中,为了后续处理器处理分配的TCM空间中数据的过程中不会出现问题,在处理器为32字节的情况下,可以对分配的TCM空间中的地址进行4字节对齐,在处理器为64字节的情况下,可以对分配的TCM空间中的地址进行8字节对齐。如果seg1和seg2资源都分配失败,说明数据量已经超出资源分配能力,可以上报异常,不执行功能层的操作。
在资源分配方式为整体方式的情况下,该资源分配方式只有一个阶段。输入数据和输出数据的资源可分配量(remain_res)可以为NN_POOL_SIZE,每段输入数据或输出数据的数据量(seg0_size)可以为remain_res除deno,deno可以为输入数据类型(in_cn)和输出数据类型(out_cn)之和。分段数(seg0_num)可以为输入数据的最小数据量(in_min_size0)与in_cn的比值再除seg0_size,in_min_size0可以为in_cn。每段输入数据的数据量可以为seg0_size乘in_cn。每段输出 数据的数据量可以为seg0_size乘out_cn。之后可以将TCM中每段输入数据的数据量的空间分配给输入数据得到第一TCM空间,可以将TCM中每段输出数据的数据量的空间分配给输出数据得到第二TCM空间。
基于同样的理由,整体方式资源分配中,在处理器为32字节的情况下,可以对分配的TCM空间中的地址进行4字节对齐,在处理器为64字节的情况下,可以对分配的TCM空间中的地址进行8字节对齐。
205、根据输入数据所需空间大小使用功能层对应的数据切割方式切割输入数据。
根据输入数据属性使用功能层对应的资源分配方式为输入数据分配第一TCM空间,以及根据输出数据属性使用功能层对应的资源分配方式为输出数据分配第二TCM空间之后,可以根据输入数据所需空间大小使用该功能层对应的数据切割方式切割输入数据。数据切割可以分成两种层次,数据量相对小的可以按照层次1进行数据切割,数据量相对大的可以按照层次2进行数据切割。在第一TCM空间使用第一阶段进行资源分配的情况下,可以使用层次1进行数据切割。在第一TCM空间使用第二阶段进行资源分配的情况下,可以使用层次2进行数据切割。数据切割方式可以为卷积层方式,也可以为池化层方式,还可以为FC层方式,还可以为整体切割方式,还可以为其它方式。
在数据切割方式为卷积层方式的情况下,可以先按照行切割再按照通道切割,即层次1输入数据和输出数据进行切割,权重数据不切割;层次2输出数据和权重数据按照输出数据的通道切割,输入数据不做切割。层次1和层次2的切割均可以利用直接存储器访问(direct memory access,DMA)多通道的搬移方式来实现。
在数据切割方式为池化层方式的情况下,可以先按照通道切割再按照行切割,即层次1的切割按照通道切割,层次2的切割按照行切割。层次1的切割可以利用DMA连续搬移方式将多通道的数据切割,层次2可以在单通道下利用DMA连续搬移方式将多行数据切割。不管层次1或者层次2权重数据只能按照通道切割。
在数据切割方式为全连接层方式的情况下,可以先按照输出数据的通道切割再按照输入数据的整体切割。层次1可以按照输出数据的通道方式切割输出数据和 权重数据,输入数据整体搬移不做切割。层次2在输出数据单通道下对输入数据和权重数据按照整体切割方式切割。
在数据切割方式为整体方式的情况下,可以按照整体数据量进行分割搬移。
上述四种方式中每次切割的输入数据的大小为第一TCM空间的大小。
在有核的情况下切割搬移行数需要比每段的行数多出几行,多出的行数为核的高与步长的高的差值。
在功能层类型为卷积、解卷积的情况下,资源分配方式和数据切割方式可以为卷积层方式。在输入数据包括核的情况下,输入数据的行数需要通过计算,在输入数据不包括核的情况下,输入数据和输出数据同等切割。
在输入数据包括核,功能层类型为深度卷积和池化的情况下,资源分配方式和数据切割方式可以为池化层方式。在输入数据不包括核,功能层类型为BN、PReLU的情况下,资源分配方式和数据切割方式可以为池化层方式。在输入数据包括核的情况下,输入数据的行数需要通过计算,在输入数据不包括核的情况下,输入数据和输出数据同等切割。
在功能层类型为FC的情况下,资源分配方式和数据切割方式可以为全连接层方式。
在功能层类型为L2N、ReLU、Sigmoid的情况下,资源分配方式和数据切割方式可以为整体方式。其中,L2为一种正则化方法。
206、获取功能层类型和输入数据类型对应的CNN算子。
可以获取功能层类型和输入数据类型对应的CNN算子。在未解析出权重属性的情况下,可以获取功能层类型、输入数据类型和输出数据类型对应的CNN算子。可以先根据功能层类型确定CNN算子的算子类型,根据输入数据类型和输出数据类型确定CNN算子的数据输入类型,之后获取算子类型和数据输入类型对应的CNN算子,即从功能层包括的CNN中选取算子类型和数据输入类型对应的CNN算子。
在解析出权重属性的情况下,可以获取功能层类型、输入数据类型和权重类型对应的CNN算子。可以先根据功能层类型确定CNN算子的算子类型,根据输入数据类型和权重类型确定CNN算子的数据输入类型,之后获取算子类型和数据 输入类型对应的CNN算子,即从功能层包括的CNN中选取算子类型和数据输入类型对应的CNN算子。
在解析出层属性的情况下,还需要根据功能层的操作属性确定CNN算子的操作属性,之后获取算子类型、数据输入类型和CNN算子的操作属性对应的CNN算子。
207、使用获取的CNN算子在第一TCM空间处理切割的输入数据。
根据输入数据属性使用功能层对应的资源分配方式为输入数据分配第一TCM空间,根据输出数据属性使用功能层对应的资源分配方式为输出数据分配第二TCM空间,根据输入数据所需空间大小使用功能层对应的数据切割方式切割输入数据,以及获取到功能层类型和输入数据类型对应的CNN算子之后,可以使用获取的CNN算子在第一TCM空间对切割的输入数据进行处理。可以先将切割的输入数据搬移到第一TCM空间,之后可以在第一TCM空间使用CNN算子对输入数据进行处理。可以先将第一数据搬移到第一TCM空间,之后可以在第一TCM空间使用功能层类型对应的CNN算子处理第一数据。在第一数据处理完之后,可以将第二数据搬移到第一TCM空间,之后可以在第一TCM空间使用功能层类型对应的CNN算子处理第二数据,直到处理完所有切割的输入数据,第一数据和第二数据是切割后的输入数据中的数据片段。
208、将输入数据的处理结果存储至第二TCM空间。
在图2所描述的数据处理方法中,由于使用功能层类型对应的资源分配方式为输入数据分配TCM空间,对所有的数据都适用,而不需要用户针对不同的数据进行预先分配并写入DSP,因此,可以提高资源分配的通用性。
请参阅图3,图3是本发明实施例提供的一种数据处理装置的结构示意图。如图3所示,该数据处理装置可以包括:
接收单元301,用于接收包括输入数据和输入数据对应的功能层类型的输入信息;
分配单元302,用于使用功能层类型对应的资源分配方式为输入数据分配第一TCM空间;
切割单元303,用于使用功能层类型对应的数据切割方式切割输入数据;
处理单元304,用于使用功能层类型对应的CNN算子在第一TCM空间处理切割后的输入数据。
在一个实施例中,该数据处理装置还可以包括:
获取单元305,用于从功能层库中获取功能层类型对应的功能层;
分配单元302,具体用于使用功能层对应的资源分配方式为输入数据分配第一TCM空间;
切割单元303,具体用于使用功能层对应的数据切割方式切割输入数据;
处理单元304,具体用于使用功能层对应的CNN算子在第一TCM空间处理切割后的输入数据。
在一个实施例中,该数据处理装置还可以包括:
解析单元306,用于解析输入信息得到输入数据属性,输入数据属性输入数据类型;
分配单元302具体用于:
使用功能层对应的资源分配方式根据输入数据类型,计算输入数据的数据量;
根据输入数据的数据量确定输入数据所需空间大小;
将TCM中输入数据所需空间大小的空间分配给输入数据,得到第一TCM空间。
在一个实施例中,切割单元303,具体用于根据输入数据所需空间大小使用功能层对应的所述数据切割方式切割输入数据。
在一个实施例中,解析单元306,具体用于解析输入信息得到输入数据属性和输出数据属性,输出数据属性包括输出数据类型;
分配单元302,还用于使用功能层对应的资源分配方式根据输出数据类型,计算输出数据的数据量,根据输出数据的数据量确定输出数据所需空间大小,以及将TCM中输出数据所需空间大小的空间分配给输出数据,得到第二TCM空间;
该数据处理装置还可以包括:
存储单元307,用于将输入数据的处理结果存储至第二TCM空间。
在一个实施例中,获取单元305,还用于获取功能层类型、输入数据类型和输 出数据类型对应的CNN算子,或者获取功能层类型、输入数据类型和输出数据类型对应的CNN算子;
处理单元304,具体用于使用获取的CNN算子在第一TCM空间处理切割后的输入数据。
有关上述接收单元301、分配单元302、切割单元303、处理单元304、获取单元305、解析单元306和存储单元307更详细的描述可以直接参考上述图1-图2所示的方法实施例中的相关描述直接得到,这里不加赘述。
请参阅图4,图4是本发明实施例提供的另一种数据处理装置的结构示意图。如图4所示,该数据处理装置可以包括处理器401、存储器402和总线403。处理器401可以是一个通用中央处理器(CPU)或多个CPU,单块或多块图形处理器(GPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。存储器402可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器402可以是独立存在,总线403与处理器401相连接。存储器402也可以和处理器401集成在一起。总线403在上述组件之间传送信息。

Claims (10)

  1. 一种数据处理方法,其特征在于,包括:
    接收包括输入数据和所述输入数据对应的功能层类型的输入信息;
    使用所述功能层类型对应的资源分配方式为所述输入数据分配第一TCM空间;
    使用所述功能层类型对应的数据切割方式切割所述输入数据;
    使用所述功能层类型对应的CNN算子在所述第一TCM空间处理切割后的输入数据。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    从功能层库中获取所述功能层类型对应的功能层;
    所述使用所述功能层类型对应的资源分配方式为所述输入数据分配第一TCM空间包括:
    使用所述功能层对应的资源分配方式为所述输入数据分配第一TCM空间;
    所述使用所述功能层类型对应的数据切割方式切割所述输入数据包括:
    使用所述功能层对应的数据切割方式切割所述输入数据;
    所述使用所述功能层类型对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
    使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    解析所述输入信息得到输入数据属性,所述输入数据属性包括输入数据类型;
    所述使用所述功能层对应的资源分配方式为所述输入数据分配第一TCM空间包括:
    使用所述功能层对应的资源分配方式根据所述输入数据类型,计 算所述输入数据的数据量;
    根据所述输入数据的数据量确定所述输入数据所需空间大小;
    将TCM中所述输入数据所需空间大小的空间分配给所述输入数据,得到第一TCM空间。
  4. 根据权利要求3所述的方法,其特征在于,所述使用所述功能层对应的数据切割方式切割所述输入数据包括:
    根据所述输入数据所需空间大小使用所述功能层对应的所述数据切割方式切割所述输入数据。
  5. 根据权利要求3或4所述的方法,其特征在于,所述解析所述输入信息得到输入数据属性包括:
    解析所述输入信息得到输入数据属性和输出数据属性,所述输出数据属性包括输出数据类型;
    所述方法还包括:
    使用所述功能层对应的资源分配方式根据所述输出数据类型,计算所述输出数据的数据量;
    根据所述输出数据的数据量确定所述输出数据所需空间大小;
    将所述TCM中所述输出数据所需空间大小的空间分配给所述输出数据,得到第二TCM空间;
    将所述输入数据的处理结果存储至所述第二TCM空间。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    获取所述功能层类型、所述输入数据类型和所述输出数据类型对应的CNN算子;
    所述使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
    使用获取的CNN算子在所述第一TCM空间处理切割后的输入数据。
  7. 根据权利要求5所述的方法,其特征在于,所述解析所述输入信息得到输入数据属性和输出数据属性包括:
    解析所述输入信息得到输入数据属性、输出数据属性和权重属性,所述权重属性包括权重类型;
    所述方法还包括:
    获取所述功能层类型、所述输入数据类型和所述权重类型对应的CNN算子;
    所述使用所述功能层对应的CNN算子在所述第一TCM空间处理切割后的输入数据包括:
    使用获取的CNN算子在所述第一TCM空间处理切割后的输入数据。
  8. 一种数据处理装置,其特征在于,包括:
    接收单元,用于接收包括输入数据和所述输入数据对应的功能层类型的输入信息;
    分配单元,用于使用所述功能层类型对应的资源分配方式为所述输入数据分配第一TCM空间;
    切割单元,用于使用所述功能层类型对应的数据切割方式切割所述输入数据;
    处理单元,用于使用所述功能层类型对应的CNN算子在所述第一TCM空间处理切割后的输入数据。
  9. 一种数据处理装置,其特征在于,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器用于调用所述程序指令执行如权利要求1-7任一项所述的数据处理方法。
  10. 一种存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的数据处理方法。
PCT/CN2019/121358 2019-06-19 2019-11-27 一种数据处理方法及装置 WO2020253117A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910530760.6A CN110413539B (zh) 2019-06-19 2019-06-19 一种数据处理方法及装置
CN201910530760.6 2019-06-19

Publications (1)

Publication Number Publication Date
WO2020253117A1 true WO2020253117A1 (zh) 2020-12-24

Family

ID=68359262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121358 WO2020253117A1 (zh) 2019-06-19 2019-11-27 一种数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN110413539B (zh)
WO (1) WO2020253117A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413539B (zh) * 2019-06-19 2021-09-14 深圳云天励飞技术有限公司 一种数据处理方法及装置
CN112286694B (zh) * 2020-12-24 2021-04-02 瀚博半导体(上海)有限公司 基于深度学习计算网络的硬件加速器内存分配方法及系统
CN113407338A (zh) * 2021-05-29 2021-09-17 国网辽宁省电力有限公司辽阳供电公司 一种分段架构的a/d转换芯片资源分配方法
CN115118678B (zh) * 2022-06-07 2024-03-12 南京全信传输科技股份有限公司 一种fc设备端的多分区网络通信系统及其通信方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066239A (zh) * 2017-03-01 2017-08-18 智擎信息系统(上海)有限公司 一种实现卷积神经网络前向计算的硬件结构
CN108304265A (zh) * 2018-01-23 2018-07-20 腾讯科技(深圳)有限公司 内存管理方法、装置及存储介质
CN108304923A (zh) * 2017-12-06 2018-07-20 腾讯科技(深圳)有限公司 卷积运算处理方法及相关产品
US20190018019A1 (en) * 2017-07-17 2019-01-17 Bioinformatics Solutions Inc. Methods and systems for de novo peptide sequencing using deep learning
CN110413539A (zh) * 2019-06-19 2019-11-05 深圳云天励飞技术有限公司 一种数据处理方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379109A1 (en) * 2015-06-29 2016-12-29 Microsoft Technology Licensing, Llc Convolutional neural networks on hardware accelerators
US10157045B2 (en) * 2016-11-17 2018-12-18 The Mathworks, Inc. Systems and methods for automatically generating code for deep learning systems
CN108154229B (zh) * 2018-01-10 2022-04-08 西安电子科技大学 基于fpga加速卷积神经网络框架的图片处理方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066239A (zh) * 2017-03-01 2017-08-18 智擎信息系统(上海)有限公司 一种实现卷积神经网络前向计算的硬件结构
US20190018019A1 (en) * 2017-07-17 2019-01-17 Bioinformatics Solutions Inc. Methods and systems for de novo peptide sequencing using deep learning
CN108304923A (zh) * 2017-12-06 2018-07-20 腾讯科技(深圳)有限公司 卷积运算处理方法及相关产品
CN108304265A (zh) * 2018-01-23 2018-07-20 腾讯科技(深圳)有限公司 内存管理方法、装置及存储介质
CN110413539A (zh) * 2019-06-19 2019-11-05 深圳云天励飞技术有限公司 一种数据处理方法及装置

Also Published As

Publication number Publication date
CN110413539B (zh) 2021-09-14
CN110413539A (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2020253117A1 (zh) 一种数据处理方法及装置
CN110515739B (zh) 深度学习神经网络模型负载计算方法、装置、设备及介质
JP2019082996A (ja) 畳み込みニューラルネットワークにおいて演算を実行する方法および装置並びに非一時的な記憶媒体
US20140351239A1 (en) Hardware acceleration for query operators
WO2021254135A1 (zh) 任务执行方法及存储设备
CN111240640B (zh) 基于硬件环境的数据量化方法、装置及可读存储介质
US11630983B2 (en) Graph conversion method
CN111079917A (zh) 张量数据分块存取的方法及装置
KR20140019413A (ko) 메모리 관리를 위한 보존 가비지 콜렉팅 및 정수 태깅
CN103064991A (zh) 一种海量数据聚类方法
CN107784195A (zh) 数据处理方法及装置
CN111966383A (zh) 一种操作系统内核兼容性量化分析方法、系统和介质
JP2021192187A (ja) 出現頻度算出プログラム、グラフィックス プロセッシング ユニット、情報処理装置、及び出現頻度算出方法
US20160246825A1 (en) Columnar database processing method and apparatus
WO2023050885A1 (zh) 应用的性能测试方法、建立性能测试模型的方法及装置
US9460002B1 (en) Memory allocation
CN112130977B (zh) 一种任务调度方法、装置、设备及介质
CN113608724B (zh) 一种基于模型缓存实现的离线仓库实时交互方法与系统
US20220066834A1 (en) Memory-bound scheduling
Lincoln et al. Cache-adaptive exploration: Experimental results and scan-hiding for adaptivity
CN114328486A (zh) 基于模型的数据质量核查方法及装置
US11275683B2 (en) Method, apparatus, device and computer-readable storage medium for storage management
CN111488970A (zh) 神经网络的执行优化方法及装置
US11442643B2 (en) System and method for efficiently converting low-locality data into high-locality data
Strutz et al. Transforming a local medical image analysis for running on a hadoop cluster

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933301

Country of ref document: EP

Kind code of ref document: A1