WO2022161060A1 - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
WO2022161060A1
WO2022161060A1 PCT/CN2021/141402 CN2021141402W WO2022161060A1 WO 2022161060 A1 WO2022161060 A1 WO 2022161060A1 CN 2021141402 W CN2021141402 W CN 2021141402W WO 2022161060 A1 WO2022161060 A1 WO 2022161060A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing unit
tensor
target
data
layout type
Prior art date
Application number
PCT/CN2021/141402
Other languages
French (fr)
Chinese (zh)
Inventor
罗佳
许铭仁
Original Assignee
展讯通信(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 展讯通信(上海)有限公司 filed Critical 展讯通信(上海)有限公司
Publication of WO2022161060A1 publication Critical patent/WO2022161060A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data processing method and device.
  • Neural network is a mathematical model or computational model that imitates the structure and function of biological neural network in the field of machine learning and cognitive science.
  • the data in the neural network can be stored in a tensor (Tensor), and the arrangement (Layout) of the data in the tensor can include NHWC and NCHW.
  • the data in the neural network can be calculated by the execution unit to obtain the corresponding calculation result.
  • the arrangement of the tensor data supported by the execution unit is incompatible with the arrangement of the tensor data supported by the neural network, the correct calculation result cannot be obtained.
  • the present application discloses a data processing method and device, which can make the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit compatible with each other.
  • the embodiments of the present application provide a data processing method and apparatus.
  • the method is applied to a terminal device.
  • the terminal device includes an application layer, a neural network model, and at least one processing unit.
  • the neural network model includes at least one hidden layer.
  • Each The processing unit corresponds to one or more hidden layers in the at least one hidden layer, and the method includes:
  • the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
  • the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit to obtain the output data.
  • the layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit. .
  • the input data is transposed by a transposition operator, and before the transposed input data is obtained, the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are obtained; If the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain the target tensor parameters .
  • the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target tensor. parameter.
  • the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the output data is transformed by the transposition operator. set to obtain the transposed output data; use the transposed output data as the input data to the next processing unit of the target processing unit.
  • the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the input of the first hidden layer of the one or more hidden layers corresponding to the first processing unit.
  • the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
  • an embodiment of the present application provides a data processing apparatus, which is applied to a terminal device.
  • the terminal device includes an application layer, a neural network model, and at least one processing unit.
  • the neural network model includes at least one hidden layer, and each processing unit corresponds to at least one processing unit.
  • One or more of a hidden layer the apparatus includes:
  • the processing subunit is used to transpose the input data through the transpose operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data , the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or a hidden layer in the at least one hidden layer;
  • the processing sub-unit is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target processing unit.
  • the tensor layout type is the same.
  • an embodiment of the present application provides a data processing device, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions,
  • the processor is configured to invoke program instructions to execute the data processing method as described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded and executed by a processor as described in Section 1.
  • a data processing method is described on the one hand.
  • an embodiment of the present application provides a chip, the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to execute the data processing method described in the first aspect.
  • an embodiment of the present application provides a chip module, where the chip module includes the chip of the fifth aspect.
  • the input data is transposed by the transposition operator to obtain the transposed input data
  • the input data is a tensor
  • the target processing unit is any processing unit in the at least one processing unit
  • the input data is the output data of the application layer or one hidden layer in the at least one hidden layer
  • the input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed to obtain output data.
  • the layout type of the target tensor parameters is the same as that of the target processing unit.
  • Fig. 1a is a system architecture diagram including a hidden layer of a neural network provided by an embodiment of the application;
  • FIG. 1b is another system architecture diagram including a neural network hidden layer provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of all hidden layers being processed only on one processing unit according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a hidden layer being processed on multiple processing units according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 6 is a simplified schematic diagram of a physical structure of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is a simplified schematic diagram of a chip of a data processing apparatus provided by an embodiment of the present application.
  • Artificial Intelligence It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Tensor is a multilinear function that can be used to represent linear relationships between some vectors, scalars, and other tensors. Tensors are a branch of mathematics with important applications in mechanics. The term tensor originated in mechanics. It was originally used to represent the stress state at various points in an elastic medium. Later, tensor theory developed into a powerful mathematical tool in mechanics and physics. The tensor concept is a generalization of the vector concept, which is a first-order tensor. Tensors are based on the generalization of vectors and matrices.
  • scalars can be regarded as zero-order tensors
  • vectors can be regarded as first-order tensors
  • matrices can be regarded as second-order tensors.
  • the arrangement of tensors in memory can include NHWC, NCHW, CHWN, etc.
  • NHWC indicates that tensors are arranged in memory as [batch, height, width, channels]
  • NCHW indicates that tensors are arranged in memory as [batch, channels, height, width].
  • batch represents the number of data input at one time
  • channels represents the number of channels
  • width represents the width
  • height represents the height.
  • the tensor of NCHW can be obtained by transposing the tensor of NHWC
  • the tensor of NHWC can be obtained by transposing the tensor of NCHW.
  • Neural network In computer science, it usually refers to artificial neural network, which is a mathematical model for information processing using a structure similar to the synaptic connection of the brain. In engineering and academia, it is often simply referred to as “neural network” or neural-like network. It is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnected relationship between a large number of internal nodes.
  • a mapping from a function space to a function space Common operators include differential operator, gradient operator, divergence operator, Laplace operator, Hamiltonian operator, etc.
  • the operator in the narrow sense actually refers to the mapping from one function space to another function space (or itself).
  • the definition of a generalized operator can be a vector space as long as the above space is extended to a general space. Normed vector space, inner product space, or even further, Banach space, Hilbert space can be. Operators can also be divided into bounded and unbounded, linear and nonlinear, and so on.
  • Transpose (Permute): Intuitively, all elements of matrix A are mirror-inverted around a ray of 45 degrees to the lower right starting from the elements in the first row and the first column, that is, the transposition of A is obtained.
  • a matrix M turn its first row into the first column, the second row into the second column, ..., the last row into the last column, thus obtaining a new matrix N. This process is called transpose of the matrix. That is, the rows and columns of matrix A are correspondingly interchanged.
  • GPU Graphics Processing Unit
  • display core Also known as display core, visual processor, and display chip, it is a kind of graphics processor specially designed for personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.).
  • the Central Processing Unit (CPU), as the computing and control core of the computer system, is the final execution unit for information processing and program operation.
  • Embedded neural network processor (Neural-network Processing Units, NPU): using the "data-driven parallel computing" architecture, it is especially good at processing massive multimedia data such as videos and images.
  • FIG. 1a is a system architecture diagram including a neural network model provided by an embodiment of the present application.
  • the system architecture includes an application layer, an input layer (Input Layer), a neural network hidden layer (Hidden Layer), a processing unit and an output layer (Output Layer), and the neural network model includes an input layer, a neural network hidden layer and output layer.
  • the output of the application layer can be used as the input of the input layer of the neural network model, and the hidden layer acts on the processing unit, and the processing unit can process the data in the hidden layer.
  • the data refers to the input data of the hidden layer and the parameters and operators carried by the hidden layer itself.
  • the hidden layer of the neural network in Fig. 1a may include one hidden layer, or may include multiple hidden layers. If multiple hidden layers are included, then for a hidden layer inside multiple hidden layers, its input is the output of the previous hidden layer, and its output is the input of the next hidden layer.
  • Figure 1b shows another system architecture diagram including a neural network model.
  • the system architecture includes an application layer, an input layer (Input Layer), the first part of the neural network hidden layer (Hidden Layer), the first processing unit, the second part of the hidden layer, the second processing unit, the third part of the hidden layer, the first part of the hidden layer Three processing units and output layer (Output Layer), the neural network model includes the input layer, the first part of the neural network hidden layer, the second part of the hidden layer, the third part of the hidden layer and the output layer.
  • the first partial hidden layer acts on the first processing unit
  • the second partial hidden layer acts on the second processing unit
  • the third partial hidden layer acts on the third processing unit.
  • the first processing unit, the second processing unit and the third processing unit may be processing units such as GPU, CPU or NPU, respectively.
  • the hidden layers in the model can be grouped, and the hidden layers of different groups can be processed by different processing units.
  • the embodiments of the present application provide a data processing method and device. The device is described in detail.
  • FIG. 2 provides a schematic flowchart of a data processing method according to an embodiment of the present application.
  • the data processing method includes the following operations 210 to 220 .
  • the method execution subject shown in FIG. 2 may be a terminal device, or the subject may be a chip in the terminal device.
  • the terminal device to which the method is applied may include an application layer, a neural network model, and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer.
  • the terminal device executes the process shown in Figure 2, the following steps may be included:
  • Operation 210 If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transpose the input data through a transposition operator to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer of the at least one hidden layer.
  • the output of the application layer is the input of the first processing unit in at least one processing unit, the output of the first processing unit is the input of the first hidden layer in the one or more hidden layers corresponding to the first processing unit, and the first processing unit The output of the last hidden layer in the one or more hidden layers corresponding to the unit is the input of the next processing unit of the first processing unit.
  • the input data is transposed by a transposition operator, and before the transposed input data is obtained, the terminal device can obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the target processing unit The tensor layout type. If the terminal device determines that the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, it needs to transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain The above target tensor parameters.
  • the target tensor parameter is a set of transposed tensor parameters of each hidden layer in all hidden layers, and the transposed tensor parameters of each hidden layer may be different from each other.
  • FIG. 3 it is a schematic diagram in which all hidden layers are processed on only one processing unit.
  • the hidden layer of the neural network is processed by only one processing unit, which may be a GPU, CPU, NPU, or other processing units that can process the hidden layer. If all hidden layers are processed on only one processing unit, the terminal device only needs to determine whether the tensor layout type of all hidden layers corresponding to the processing unit is the same as the tensor layout type of the processing unit.
  • the terminal device needs to transpose the original tensor parameters in all hidden layers to obtain the target Tensor parameters.
  • the layout type of the target tensor parameter is the same as the tensor layout type of the processing unit, both of which are NHWC.
  • the original tensor parameter refers to the parameter in the operator of the hidden layer, such as the Weights parameter of the Conv2d operator. Convert from NHWC to NCHW to fit the layout constraints of the target cell pair.
  • FIG. 4 a schematic diagram of a hidden layer being processed on multiple processing units is shown.
  • all hidden layers in the neural network model are divided into the first part of the hidden layer, the second part of the hidden layer and the third part of the hidden layer.
  • the first part of the hidden layer is processed by the first processing unit
  • the second part of the hidden layer is processed by the second processing unit
  • the third part of the hidden layer is processed by the point processing unit.
  • the tensor layout types of the first hidden layer, the second hidden layer, and the third hidden layer are all the same, and their tensor layout types are determined when the model is built.
  • the processors whose tensor layout type is not the same as the tensor layout type of the hidden layer.
  • the tensor layout type of the hidden layer is NHWC
  • the tensor layout type of the first processing unit is NCHW
  • the tensor layout type of the second processing unit is NHWC
  • the third tensor layout type is NCHW.
  • the terminal device can determine that the tensor layout type of the first part of the hidden layer is different from the tensor layout type of the first processing unit; the tensor layout type of the second part of the hidden layer is the same as the tensor layout type of the second processing unit; The tensor layout type of the third partial hidden layer is not the same as the tensor layout type of the third processing unit. Therefore, it is necessary to transpose the original tensor parameters of all hidden layers in the first hidden layer, and transpose the original tensor parameters of all hidden layers in the third hidden layer.
  • the tensor layout type of the first part of the hidden layer can be the same as the tensor layout type of the first processing unit
  • the tensor layout type of the third part of the hidden layer can be the same as the tensor layout type of the third processing unit.
  • the original tensor parameters in all hidden layers corresponding to the target processing unit are used as Target tensor parameter.
  • the hidden layer tensor layout type is the same as the tensor layout type of the corresponding processing unit
  • the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, you need to The input data is transposed, where the input data is a tensor.
  • the input data can be transposed by the method of the transposition operator to obtain the transposed input data.
  • the layout type of the transposed input data is the same as the tensor layout type of the target processing unit.
  • the input layer may input the input data into the hidden layer corresponding to the processing unit.
  • a transpose (Permute) operator is inserted after the input layer, and the input data is transposed by the operator.
  • the terminal device will perform an operation on the original tensor parameters in the hidden layer. Transpose so that the tensor layout type of this hidden layer is the same as the tensor layout type of the processing unit. Therefore, comparing the layout type of the input data with the tensor layout type of the hidden layer is equivalent to comparing the layout type of the input data with the tensor layout type of the processing unit.
  • the input layer may input input data to the first hidden layer in the first part of the hidden layers. If the terminal device determines that the input data is not the same as the tensor layout type of the first hidden layer in the first part of the hidden layer, it can also insert a Permute operator after the input layer, transpose the input data, and transfer the The post-positioned input data is input to the first hidden layer in the first partial hidden layer. After the last hidden layer in the first part of the hidden layer has processed the input data input by the previous hidden layer (ie, the output data of the previous hidden layer), the output data of the last hidden layer can be transposed.
  • the method of transposition is to add a Permute operator after the last hidden layer in the first part of the hidden layer, so that the transposed output data can be obtained.
  • the layout type of the transposed output data is the same as that in the first part of the hidden layer.
  • the layout type of the input data of the first hidden layer is the same.
  • the terminal device needs to determine the first hidden layer in the second hidden layer. Whether the layout type of the input data of the first hidden layer in the two-part hidden layer and the tensor layout type of the first hidden layer in the second-part hidden layer are the same.
  • the last hidden layer of the second part of the hidden layer and the first hidden layer of the third part of the hidden layer can also use the same method to input or output data.
  • transpose of the original tensor parameters of the hidden layer is directly transposed, and can be transposed without the Permute operator.
  • Operation 220 Process the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, the layout type of the target tensor parameter is the same as that of the target processing unit.
  • the volume layout type is the same.
  • the target processing unit processes the input data in the corresponding hidden layer and the target tensor parameters in the hidden layer. If there are multiple hidden layers, the target processing unit will process the input data and tensor parameters in each hidden layer one by one.
  • the output data can be obtained after the processing of the target processing unit is completed. After the output data is obtained, the output data can be transposed through the Permute operator to obtain the transposed output data. and use the transposed output data as input data to the next processing unit of the target processing unit. In this way, data flow between different processing units is realized.
  • the terminal device firstly transposes the original tensor parameters of the hidden layer when the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are different, to obtain A target tensor parameter whose layout type is the same as the tensor layout type of this target processing unit. Further, the layout type of the input data input to the target processing unit can be compared with the tensor layout type of the target processing unit. If it is determined that they are not the same, the input data can be transposed by the transposition operator to obtain the transposition The layout type of the transposed input data is the same as the tensor layout type of the target processing unit.
  • the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit can be made compatible with each other, which is beneficial for the processing unit to process the input data and the parameters in the hidden layer of the neural network.
  • FIG. 5 is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application.
  • the data processing apparatus shown in FIG. 5 can be used to perform some or all of the functions in the method embodiment described in FIG. 2 above.
  • the device may be a terminal device, or a device in the terminal device, or a device that can be used in combination with the terminal device.
  • the logical structure of the apparatus may include: a processing subunit 510 and an obtaining subunit 520 .
  • the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer ,in:
  • the processing subunit 510 is configured to transpose the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
  • the above-mentioned processing sub-unit 510 is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target tensor parameters.
  • the tensor layout type for processing units is the same.
  • the input data is transposed by a transposition operator, and before the transposed input data is obtained, the acquisition subunit 520 is used to acquire the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit; the above-mentioned processing subunit 510 is also used for if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, then the corresponding The original tensor parameters in all hidden layers are transposed to obtain the target tensor parameters.
  • the processing subunit 510 is further configured to, if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are the same, all hidden layers corresponding to the target processing unit The original tensor parameter in the layer as the target tensor parameter.
  • the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the above-mentioned processing subunit 510 further uses The output data is transposed by a transposition operator to obtain the transposed output data; the transposed output data is used as the input data input to the next processing unit of the target processing unit.
  • the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the first hidden layer in one or more hidden layers corresponding to the first processing unit
  • the input of the layer, the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
  • FIG. 6 is a simplified schematic diagram of the physical structure of a data processing apparatus according to an embodiment of the application.
  • the apparatus includes a processor 610 , a memory 620 and a communication interface 630 .
  • the processor 610 , the memory 620 and the communication interface 630 Connected via one or more communication buses.
  • the data processing device may be a chip, a chip module, or the like.
  • the processor 610 is configured to support the data processing apparatus to perform functions corresponding to the method in FIG. 2 described above. It should be understood that in the embodiment of the present application, the processor 610 may be a central processing unit (central processing unit, CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) ), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 620 is used to store program codes and the like.
  • the memory 620 in this embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memory.
  • the non-volatile memory may be read-only memory (ROM for short), programmable read-only memory (PROM for short), erasable programmable read-only memory (EPROM for short) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM for short) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous Dynamic random access memory
  • SDRAM synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Synchronous connection dynamic random access memory
  • DR RAM direct memory bus random access memory
  • the communication interface 630 is used to send and receive data, information or messages, etc., and can also be described as a transceiver, a transceiver circuit, and the like.
  • the processor 610 calls the program code stored in the memory 620 to perform the following operations:
  • the processor 610 calls the program code stored in the memory 620. If the layout type of the input data input to the target processing unit is not the same as the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transpose.
  • the set input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
  • the processor 610 calls the program code stored in the memory 620 to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain the output data, the layout of the target tensor parameters
  • the type is the same as the tensor layout type of the target processing unit.
  • the input data is transposed by a transposition operator, and before the transposed input data is obtained, the processor 610 calls the program code stored in the memory 620 to obtain the hidden layer corresponding to the target processing unit.
  • the tensor layout type and the tensor layout type of the target processing unit; if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit The original tensor parameters are transposed to obtain the target tensor parameters.
  • the processor 610 calls the program code stored in the memory 620 if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, then the target processing unit corresponds to the tensor layout type of the target processing unit.
  • the original tensor parameters in all hidden layers of are used as target tensor parameters.
  • the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the processor 610 calls the memory 620
  • the stored program code transposes the output data through a transposition operator to obtain the transposed output data; the transposed output data is used as the input data to the next processing unit of the target processing unit.
  • the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the first hidden layer in one or more hidden layers corresponding to the first processing unit
  • the input of the layer, the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
  • modules/units included in the devices and products described in the foregoing embodiments may be software modules/units or hardware modules/units, or may be partly software modules/units and partly hardware modules/units.
  • each module/unit included therein may be implemented by hardware such as circuits, or at least some of the modules/units may be implemented by a software program.
  • the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the chip module, the modules/units contained therein can be They are all implemented by hardware such as circuits, and different modules/units can be located in the same component of the chip module (such as chips, circuit modules, etc.) or in different components, or at least some of the modules/units can be implemented by software programs.
  • the software program runs on the processor integrated inside the chip module, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the terminal, each module contained in it
  • the units/units may all be implemented in hardware such as circuits, and different modules/units may be located in the same component (eg, chip, circuit module, etc.) or in different components in the terminal, or at least some of the modules/units may be implemented by software programs Realization, the software program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented in hardware such as circuits.
  • FIG. 7 is a simplified schematic diagram of a chip provided by an embodiment of the present application, and the chip includes a processor 710 and a data interface 720 .
  • the chip can be used to process functions corresponding to the method in FIG. 2 .
  • the chip may be included in a data processing apparatus as shown in FIG. 6 .
  • the chip may also be included in a chip module.
  • the units in the processing device in the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
  • a computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • Computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g.
  • coaxial cable, optical fiber, digital subscriber line) or wireless means to another website site, computer, server or data center.
  • a computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • Useful media may be magnetic media (eg, floppy disks, storage disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.
  • Embodiments of the present application further provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute part or all of the steps of any method described in the above method embodiments , the above computer includes electronic equipment.
  • Embodiments of the present application further provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute any one of the method embodiments described above. some or all of the steps of the method.
  • the computer program product may be a software installation package, and the computer includes an electronic device.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed method, apparatus and system may be implemented in other manners.
  • the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included individually, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute some steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store program codes medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Disclosed in the present application are a data processing method and apparatus. The method comprises: if the layout type of input data inputted to a target processing unit is different from the tensor layout type of the target processing unit, transposing the input data by a transposition operator to obtain transposed input data, the input data being a tensor, the target processing unit being any of at least one processing unit, and the input data being output data of an application layer or one of at least one hidden layer; and processing, by the target processing unit, the transposed input data and target tensor parameters in all the hidden layers corresponding to the target processing unit to obtain the output data, the layout type of the target tensor parameters being the same as the tensor layout type of the target processing unit. By means of the method, the input data, the tensor layout type of a neural network hidden layer, and the tensor layout type of an execution unit can be compatible with one another.

Description

数据处理方法及装置Data processing method and device 技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种数据处理方法及装置。The present application relates to the field of computer technology, and in particular, to a data processing method and device.
背景技术Background technique
神经网络是机器学习和认知科学领域中一种模仿生物神经网络的结构和功能的数学模型或计算模型。神经网络中的数据可以存储在张量(Tensor)中,张量中的数据的排列方式(Layout)可以包括NHWC和NCHW两种。神经网络中的数据可以通过执行单元进行计算,从而得到相应的计算结果。然而,当执行单元所支持的张量数据的排列方式与神经网络支持的张量数据的排列方式不兼容时,会导致无法得到正确的计算结果。Neural network is a mathematical model or computational model that imitates the structure and function of biological neural network in the field of machine learning and cognitive science. The data in the neural network can be stored in a tensor (Tensor), and the arrangement (Layout) of the data in the tensor can include NHWC and NCHW. The data in the neural network can be calculated by the execution unit to obtain the corresponding calculation result. However, when the arrangement of the tensor data supported by the execution unit is incompatible with the arrangement of the tensor data supported by the neural network, the correct calculation result cannot be obtained.
发明内容SUMMARY OF THE INVENTION
本申请公开了一种数据处理方法及装置,可以使输入数据、神经网络隐藏层的张量布局类型和执行单元的张量布局类型相互兼容。The present application discloses a data processing method and device, which can make the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit compatible with each other.
第一方面,本申请实施例提供了一种数据处理方法及装置,该方法应用于终端设备,终端设备包括应用层、神经网络模型以及至少一个处理单元,神经网络模型包括至少一个隐藏层,各个处理单元对应至少一个隐藏层中的一个或多个隐藏层,该方法包括:In a first aspect, the embodiments of the present application provide a data processing method and apparatus. The method is applied to a terminal device. The terminal device includes an application layer, a neural network model, and at least one processing unit. The neural network model includes at least one hidden layer. Each The processing unit corresponds to one or more hidden layers in the at least one hidden layer, and the method includes:
若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据;If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transpose the input data through the transpose operator to obtain the transposed input data, and the input data is a tensor , the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,目标张量参数的布局类型与目标处理单元的张量布局类型相同。The transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit to obtain the output data. The layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit. .
在一实施方式中,通过转置算子对输入数据进行转置,得到转置后的输入 数据之前,获取目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型;若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型不相同,则对目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到目标张量参数。In one embodiment, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are obtained; If the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain the target tensor parameters .
在一实施方式中,若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型相同,则将目标处理单元对应的所有隐藏层中的原始张量参数作为目标张量参数。In one embodiment, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target tensor. parameter.
在一实施方式中,通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据之后,通过转置算子对输出数据进行转置,得到转置后的输出数据;将转置后的输出数据作为输入至目标处理单元的下一个处理单元的输入数据。In one embodiment, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the output data is transformed by the transposition operator. set to obtain the transposed output data; use the transposed output data as the input data to the next processing unit of the target processing unit.
在一实施方式中,应用层的输出为至少一个处理单元中第一处理单元的输入,第一处理单元的输出为第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为第一处理单元的下一个处理单元的输入。In one embodiment, the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the input of the first hidden layer of the one or more hidden layers corresponding to the first processing unit. , the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
第二方面,本申请实施例提供了一种数据处理装置,应用于终端设备,终端设备包括应用层、神经网络模型以及至少一个处理单元,神经网络模型包括至少一个隐藏层,各个处理单元对应至少一个隐藏层中的一个或多个隐藏层,该装置包括:In a second aspect, an embodiment of the present application provides a data processing apparatus, which is applied to a terminal device. The terminal device includes an application layer, a neural network model, and at least one processing unit. The neural network model includes at least one hidden layer, and each processing unit corresponds to at least one processing unit. One or more of a hidden layer, the apparatus includes:
处理子单元,用于若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据;The processing subunit is used to transpose the input data through the transpose operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data , the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or a hidden layer in the at least one hidden layer;
处理子单元还用于通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,目标张量参数的布局类型与目标处理单元的张量布局类型相同。The processing sub-unit is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target processing unit. The tensor layout type is the same.
第三方面,本申请实施例提供了一种数据处理装置,包括处理器、存储器和通信接口,处理器、存储器和通信接口相互连接,其中,存储器用于存储计 算机程序,计算机程序包括程序指令,处理器被配置用于调用程序指令,执行如第一方面描述的数据处理方法。In a third aspect, an embodiment of the present application provides a data processing device, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions, The processor is configured to invoke program instructions to execute the data processing method as described in the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,其特征在于,计算机可读存储介质存储有一条或多条指令,一条或多条指令适于由处理器加载并执行如第一方面描述的数据处理方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded and executed by a processor as described in Section 1. A data processing method is described on the one hand.
第五方面,本申请实施例提供了一种芯片,该芯片包括处理器与数据接口,处理器通过数据接口读取存储器上存储的指令,以执行如第一方面描述的数据处理方法。In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to execute the data processing method described in the first aspect.
第六方面,本申请实施例提供了一种芯片模组,该芯片模组包括如第五方面的芯片。In a sixth aspect, an embodiment of the present application provides a chip module, where the chip module includes the chip of the fifth aspect.
本申请实施例中,若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据;通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,目标张量参数的布局类型与目标处理单元的张量布局类型相同。通过该方法,可以使输入数据、神经网络隐藏层的张量布局类型和执行单元的张量布局类型相互兼容。In the embodiment of the present application, if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transposed input data , the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer; The input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed to obtain output data. The layout type of the target tensor parameters is the same as that of the target processing unit. Through this method, the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit can be made compatible with each other.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1a为本申请实施例提供的一种包括神经网络隐藏层的系统架构图;Fig. 1a is a system architecture diagram including a hidden layer of a neural network provided by an embodiment of the application;
图1b为本申请实施例提供的另一种包括神经网络隐藏层的系统架构图;FIG. 1b is another system architecture diagram including a neural network hidden layer provided by an embodiment of the application;
图2为本申请实施例提供的一种数据处理方法的流程示意图;2 is a schematic flowchart of a data processing method provided by an embodiment of the present application;
图3为本申请实施例提供的一种所有隐藏层仅在一个处理单元上被处理的示意图;3 is a schematic diagram of all hidden layers being processed only on one processing unit according to an embodiment of the present application;
图4为本申请实施例提供的一种隐藏层在多个处理单元上被处理的示意图;4 is a schematic diagram of a hidden layer being processed on multiple processing units according to an embodiment of the present application;
图5为本申请实施例提供的一种数据处理装置的单元示意图;5 is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application;
图6为本申请实施例提供的一种数据处理装置的实体结构简化示意图;6 is a simplified schematic diagram of a physical structure of a data processing apparatus provided by an embodiment of the present application;
图7为本申请实施例提供的一种数据处理装置的芯片简化示意图。FIG. 7 is a simplified schematic diagram of a chip of a data processing apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
为了能够更好地理解本申请实施例,下面对本申请实施例涉及的专业术语进行介绍:In order to better understand the embodiments of the present application, the technical terms involved in the embodiments of the present application are introduced below:
人工智能(Artificial Intelligence,AI):是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial Intelligence (AI): It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
张量(Tensor):是一个可用来表示在一些矢量、标量和其他张量之间的线性关系的多线性函数。张量是数学的一个分支学科,在力学中有重要应用。张量这一术语起源于力学,它最初是用来表示弹性介质中各点应力状态的,后来张量理论发展成为力学和物理学的一个有力的数学工具。张量概念是矢量概念的推广,矢量是一阶张量。张量是基于向量和矩阵的推广,比如可以将标量视为零阶张量,矢量视为一阶张量,矩阵视为二阶张量。另外,张量在内存中排列方式(Layout,或称内存布局)可以包括NHWC、NCHW、CHWN等。 其中,NHWC表示张量在内存中排列方式为[batch、height、width、channels],NCHW表示张量在内存中排列方式为[batch、channels、height、width]。其中,batch表示一次输入的数据个数,channels表示通道数,width表示宽度,height表示高度。在本申请实施例中,对NHWC的张量进行转置可以得到NCHW的张量,类似地,对NCHW的张量进行转置可以得到NHWC的张量。Tensor: is a multilinear function that can be used to represent linear relationships between some vectors, scalars, and other tensors. Tensors are a branch of mathematics with important applications in mechanics. The term tensor originated in mechanics. It was originally used to represent the stress state at various points in an elastic medium. Later, tensor theory developed into a powerful mathematical tool in mechanics and physics. The tensor concept is a generalization of the vector concept, which is a first-order tensor. Tensors are based on the generalization of vectors and matrices. For example, scalars can be regarded as zero-order tensors, vectors can be regarded as first-order tensors, and matrices can be regarded as second-order tensors. In addition, the arrangement of tensors in memory (Layout, or memory layout) can include NHWC, NCHW, CHWN, etc. Among them, NHWC indicates that tensors are arranged in memory as [batch, height, width, channels], and NCHW indicates that tensors are arranged in memory as [batch, channels, height, width]. Among them, batch represents the number of data input at one time, channels represents the number of channels, width represents the width, and height represents the height. In the embodiment of the present application, the tensor of NCHW can be obtained by transposing the tensor of NHWC, and similarly, the tensor of NHWC can be obtained by transposing the tensor of NCHW.
神经网络:在计算机科学中通常指人工神经网络,是一种应用类似于大脑神经突触联接的结构进行信息处理的数学模型。在工程与学术界也常直接简称为“神经网络”或类神经网络。它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的。Neural network: In computer science, it usually refers to artificial neural network, which is a mathematical model for information processing using a structure similar to the synaptic connection of the brain. In engineering and academia, it is often simply referred to as "neural network" or neural-like network. It is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnected relationship between a large number of internal nodes.
算子:一个函数空间到函数空间上的映射。常见的算子有微分算子,梯度算子,散度算子,拉普拉斯算子,哈密顿算子等。狭义的算子实际上是指从一个函数空间到另一个函数空间(或它自身)的映射。广义的算子的定义只要把上面的空间推广到一般空间,可以是向量空间。赋范向量空间,内积空间,或更进一步,Banach空间,Hilbert空间都可以。算子还可分为有界的与无界的,线性的与非线性的等等类别。Operator: A mapping from a function space to a function space. Common operators include differential operator, gradient operator, divergence operator, Laplace operator, Hamiltonian operator, etc. The operator in the narrow sense actually refers to the mapping from one function space to another function space (or itself). The definition of a generalized operator can be a vector space as long as the above space is extended to a general space. Normed vector space, inner product space, or even further, Banach space, Hilbert space can be. Operators can also be divided into bounded and unbounded, linear and nonlinear, and so on.
转置(Permute):直观来看,将矩阵A的所有元素绕着一条从第1行第1列元素出发的右下方45度的射线作镜面反转,即得到A的转置。一个矩阵M,把它的第一行变成第一列,第二行变成第二列,......,最末一行变为最末一列,从而得到一个新的矩阵N。这一过程称为矩阵的转置。即矩阵A的行和列对应互换。Transpose (Permute): Intuitively, all elements of matrix A are mirror-inverted around a ray of 45 degrees to the lower right starting from the elements in the first row and the first column, that is, the transposition of A is obtained. A matrix M, turn its first row into the first column, the second row into the second column, ..., the last row into the last column, thus obtaining a new matrix N. This process is called transpose of the matrix. That is, the rows and columns of matrix A are correspondingly interchanged.
图形处理器(Graphics Processing Unit,GPU):又称显示核心、视觉处理器、显示芯片,是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上做图像和图形相关运算工作的微处理器。Graphics Processing Unit (GPU): Also known as display core, visual processor, and display chip, it is a kind of graphics processor specially designed for personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.). A microprocessor for image and graphics related operations.
中央处理器(Central Processing Unit,CPU)作为计算机系统的运算和控制核心,是信息处理、程序运行的最终执行单元。The Central Processing Unit (CPU), as the computing and control core of the computer system, is the final execution unit for information processing and program operation.
嵌入式神经网络处理器(Neural-network Processing Units,NPU):采用“数据驱动并行计算”的架构,特别擅长处理视频、图像类的海量多媒体数据。Embedded neural network processor (Neural-network Processing Units, NPU): using the "data-driven parallel computing" architecture, it is especially good at processing massive multimedia data such as videos and images.
为了能够更好地理解本申请实施例,下面对本申请实施例可应用的网络架 构进行说明。In order to better understand the embodiments of the present application, a network architecture applicable to the embodiments of the present application is described below.
请参见图1a,图1a为本申请实施例提供的一种包括神经网络模型的系统架构图。如图1a所示,该系统架构中包括了应用层、输入层(Input Layer)、神经网络隐藏层(Hidden Layer)、处理单元和输出层(Output Layer),神经网络模型包括输入层、神经网络隐藏层和输出层。其中,应用层的输出可以作为神经网络模型的输入层的输入,隐藏层作用在处理单元上,处理单元可以对隐藏层中的数据进行处理。其中,该数据指的是隐藏层的输入数据和隐藏层自身携带的参数与算子等。需要说明的是,图1a中的神经网络隐藏层可以包括一个隐藏层,也可以包括多个隐藏层。若包括多个隐藏层,那么对于一个处于多个隐藏层内部的隐藏层来说,它的输入就是上一个隐藏层的输出,它的输出就是下一个隐藏层的输入。Please refer to FIG. 1a. FIG. 1a is a system architecture diagram including a neural network model provided by an embodiment of the present application. As shown in Figure 1a, the system architecture includes an application layer, an input layer (Input Layer), a neural network hidden layer (Hidden Layer), a processing unit and an output layer (Output Layer), and the neural network model includes an input layer, a neural network hidden layer and output layer. Among them, the output of the application layer can be used as the input of the input layer of the neural network model, and the hidden layer acts on the processing unit, and the processing unit can process the data in the hidden layer. Among them, the data refers to the input data of the hidden layer and the parameters and operators carried by the hidden layer itself. It should be noted that the hidden layer of the neural network in Fig. 1a may include one hidden layer, or may include multiple hidden layers. If multiple hidden layers are included, then for a hidden layer inside multiple hidden layers, its input is the output of the previous hidden layer, and its output is the input of the next hidden layer.
如图1b所示为另一种包括神经网络模型的系统架构图。该系统架构中包括了应用层、输入层(Input Layer)、第一部分神经网络隐藏层(Hidden Layer)、第一处理单元、第二部分隐藏层、第二处理单元、第三部分隐藏层、第三处理单元和输出层(Output Layer),神经网络模型包括输入层、第一部分神经网络隐藏层、第二部分隐藏层、第三部分隐藏层和输出层。其中,第一部分隐藏层作用于第一处理单元,第二部分隐藏层作用于第二处理单元,第三部分隐藏层作用于第三处理单元。第一处理单元,第二处理单元和第三处理单元可以分别是GPU、CPU或NPU等处理单元。当神经网络的计算模型需要使用不同的处理单元进行处理时,模型可以将其中的隐藏层进行分组,不同组的隐藏层可以由不同的处理单元进行处理。需要说明的是,第一部分隐藏层、第二部分隐藏层和第三部分隐藏层之间也有多个输入层和输出层。即对于一个处于多个隐藏层内部的隐藏层来说,它的输入就是上一个隐藏层的输出,它的输出就是下一个隐藏层的输入。Figure 1b shows another system architecture diagram including a neural network model. The system architecture includes an application layer, an input layer (Input Layer), the first part of the neural network hidden layer (Hidden Layer), the first processing unit, the second part of the hidden layer, the second processing unit, the third part of the hidden layer, the first part of the hidden layer Three processing units and output layer (Output Layer), the neural network model includes the input layer, the first part of the neural network hidden layer, the second part of the hidden layer, the third part of the hidden layer and the output layer. Wherein, the first partial hidden layer acts on the first processing unit, the second partial hidden layer acts on the second processing unit, and the third partial hidden layer acts on the third processing unit. The first processing unit, the second processing unit and the third processing unit may be processing units such as GPU, CPU or NPU, respectively. When the computational model of the neural network needs to be processed by different processing units, the hidden layers in the model can be grouped, and the hidden layers of different groups can be processed by different processing units. It should be noted that there are also multiple input layers and output layers between the first part of the hidden layer, the second part of the hidden layer and the third part of the hidden layer. That is, for a hidden layer inside multiple hidden layers, its input is the output of the previous hidden layer, and its output is the input of the next hidden layer.
为了能够使输入数据、神经网络的张量布局类型和执行单元的张量布局类型相互兼容,本申请实施例提供了一种数据处理方法及装置,下面进一步对本申请实施例提供的数据处理方法及装置进行详细介绍。In order to make the input data, the tensor layout type of the neural network and the tensor layout type of the execution unit compatible with each other, the embodiments of the present application provide a data processing method and device. The device is described in detail.
请参见图2,图2为本申请实施例提供了一种数据处理方法的流程示意图。该数据处理方法包括如下操作210~操作220。图2所示的方法执行主体可以为 终端设备,或主体可以为终端设备中的芯片。其中,该方法应用的终端设备可以包括应用层、神经网络模型以及至少一个处理单元,神经网络模型包括至少一个隐藏层,各个处理单元对应至少一个隐藏层中的一个或多个隐藏层。当终端设备执行如图2所示的流程时,可以包括以下步骤:Please refer to FIG. 2. FIG. 2 provides a schematic flowchart of a data processing method according to an embodiment of the present application. The data processing method includes the following operations 210 to 220 . The method execution subject shown in FIG. 2 may be a terminal device, or the subject may be a chip in the terminal device. The terminal device to which the method is applied may include an application layer, a neural network model, and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer. When the terminal device executes the process shown in Figure 2, the following steps may be included:
操作210、若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据。Operation 210: If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transpose the input data through a transposition operator to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer of the at least one hidden layer.
其中,应用层的输出为至少一个处理单元中第一处理单元的输入,第一处理单元的输出为第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为第一处理单元的下一个处理单元的输入。The output of the application layer is the input of the first processing unit in at least one processing unit, the output of the first processing unit is the input of the first hidden layer in the one or more hidden layers corresponding to the first processing unit, and the first processing unit The output of the last hidden layer in the one or more hidden layers corresponding to the unit is the input of the next processing unit of the first processing unit.
在一种可能的实现方式中,通过转置算子对输入数据进行转置,得到转置后的输入数据之前,终端设备可以获取目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型。若终端设备确定目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型不相同,则需要对目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到上述目标张量参数。其中,该目标张量参数是所有隐藏层中各个隐藏层的转置后的张量参数的集合,各个隐藏层的转置后的张量参数可能互不相同。In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the terminal device can obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the target processing unit The tensor layout type. If the terminal device determines that the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, it needs to transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain The above target tensor parameters. The target tensor parameter is a set of transposed tensor parameters of each hidden layer in all hidden layers, and the transposed tensor parameters of each hidden layer may be different from each other.
可选的,如图3所示为一种所有隐藏层仅在一个处理单元上被处理的示意图。图3中,神经网络隐藏层仅由一个处理单元处理,该处理单元可以是GPU、CPU、NPU或其他可以对隐藏层进行处理的处理单元。若所有隐藏层仅在一个处理单元上被处理,则终端设备只需要确定该处理单元对应的所有隐藏层的张量布局类型是否与该处理单元的张量布局类型相同。例如,该处理单元对应的所有隐藏层的张量布局类型为NHWC,而该处理单元的张量布局类型为NCHW,则终端设备需要将所有隐藏层中的原始张量参数进行转置,得到目标张量参数。该目标张量参数的布局类型就和处理单元的张量布局类型一致,均为NHWC,所述原始张量参数是指隐藏层的算子中的参数,例如Conv2d算子的Weights参数,将其从NHWC转换成NCHW,以适配目标计算单元对的 Layout限制。Optionally, as shown in FIG. 3 , it is a schematic diagram in which all hidden layers are processed on only one processing unit. In Figure 3, the hidden layer of the neural network is processed by only one processing unit, which may be a GPU, CPU, NPU, or other processing units that can process the hidden layer. If all hidden layers are processed on only one processing unit, the terminal device only needs to determine whether the tensor layout type of all hidden layers corresponding to the processing unit is the same as the tensor layout type of the processing unit. For example, if the tensor layout type of all hidden layers corresponding to this processing unit is NHWC, and the tensor layout type of this processing unit is NCHW, the terminal device needs to transpose the original tensor parameters in all hidden layers to obtain the target Tensor parameters. The layout type of the target tensor parameter is the same as the tensor layout type of the processing unit, both of which are NHWC. The original tensor parameter refers to the parameter in the operator of the hidden layer, such as the Weights parameter of the Conv2d operator. Convert from NHWC to NCHW to fit the layout constraints of the target cell pair.
可选的,如图4所示为一种隐藏层在多个处理单元上被处理的示意图。图4中,神经网络模型中的所有隐藏层被分为了第一部分隐藏层、第二部分隐藏层和第三部分隐藏层。其中,第一部分隐藏层由第一处理单元处理,第二部分隐藏层由第二处理单元处理,第三部分隐藏层由点处理单元处理。第一部分隐藏层、第二部分隐藏层和第三部分隐藏层的张量布局类型都是相同的,它们的张量布局类型在模型被构建时就已确定出。而三个处理单元中可能有张量布局类型与隐藏层的张量布局类型不相同的处理器。例如,隐藏层的张量布局类型为NHWC,第一处理单元的张量布局类型为NCHW,第二处理单元的张量布局类型为NHWC,第三张量布局类型为NCHW。那么终端设备就可以确定第一部分隐藏层的张量布局类型与第一处理单元的张量布局类型不相同;第二部分隐藏层的张量布局类型与第二处理单元的张量布局类型相同;第三部分隐藏层的张量布局类型与第三处理单元的张量布局类型不相同。故需要对第一部分隐藏层中的所有隐藏层的原始张量参数进行转置,对第三隐藏层中所有隐藏层的原始张量参数进行转置。这样,第一部分隐藏层的张量布局类型则可以和第一处理单元的张量布局类型相同,第三部分隐藏层的张量布局类型则可以和第三处理单元的张量布局类型相同。Optionally, as shown in FIG. 4 , a schematic diagram of a hidden layer being processed on multiple processing units is shown. In Figure 4, all hidden layers in the neural network model are divided into the first part of the hidden layer, the second part of the hidden layer and the third part of the hidden layer. The first part of the hidden layer is processed by the first processing unit, the second part of the hidden layer is processed by the second processing unit, and the third part of the hidden layer is processed by the point processing unit. The tensor layout types of the first hidden layer, the second hidden layer, and the third hidden layer are all the same, and their tensor layout types are determined when the model is built. Among the three processing units, there may be processors whose tensor layout type is not the same as the tensor layout type of the hidden layer. For example, the tensor layout type of the hidden layer is NHWC, the tensor layout type of the first processing unit is NCHW, the tensor layout type of the second processing unit is NHWC, and the third tensor layout type is NCHW. Then the terminal device can determine that the tensor layout type of the first part of the hidden layer is different from the tensor layout type of the first processing unit; the tensor layout type of the second part of the hidden layer is the same as the tensor layout type of the second processing unit; The tensor layout type of the third partial hidden layer is not the same as the tensor layout type of the third processing unit. Therefore, it is necessary to transpose the original tensor parameters of all hidden layers in the first hidden layer, and transpose the original tensor parameters of all hidden layers in the third hidden layer. In this way, the tensor layout type of the first part of the hidden layer can be the same as the tensor layout type of the first processing unit, and the tensor layout type of the third part of the hidden layer can be the same as the tensor layout type of the third processing unit.
在一种可能的实现方式中,若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型相同,则将目标处理单元对应的所有隐藏层中的原始张量参数作为目标张量参数。In a possible implementation, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as Target tensor parameter.
在隐藏层张量布局类型与对应的处理单元的张量布局类型相同的情况下,若输入至目标处理单元的输入数据的布局类型与该目标处理单元的张量布局类型不相同,则需要对该输入数据进行转置,其中,该输入数据是张量。具体可以通过转置算子的方法对输入数据进行转置,得到转置后的输入数据。该转置后的输入数据的布局类型则与目标处理单元的张量布局类型相同。In the case where the hidden layer tensor layout type is the same as the tensor layout type of the corresponding processing unit, if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, you need to The input data is transposed, where the input data is a tensor. Specifically, the input data can be transposed by the method of the transposition operator to obtain the transposed input data. The layout type of the transposed input data is the same as the tensor layout type of the target processing unit.
可选的,如图3所示,输入层可以将输入数据输入至该处理单元对应的隐藏层中。而终端设备确定该输入数据的布局类型与该隐藏层的张量布局类型不相同,则会在输入层之后插入一个转置(Permute)算子,通过该算子将输入数据进行转置。需要说明的是,在向该隐藏层输入上述输入数据之前,若隐藏 层的张量布局类型与该处理单元的张量布局类型不相同,则终端设备会对隐藏层中的原始张量参数进行转置,使该隐藏层的张量布局类型与处理单元的张量布局类型相同。所以,将输入数据的布局类型与该隐藏层的张量布局类型进行对比,和将输入数据的布局类型与该处理单元的张量布局类型进行对比,是等价的。Optionally, as shown in FIG. 3 , the input layer may input the input data into the hidden layer corresponding to the processing unit. When the terminal device determines that the layout type of the input data is different from the tensor layout type of the hidden layer, a transpose (Permute) operator is inserted after the input layer, and the input data is transposed by the operator. It should be noted that, before inputting the above-mentioned input data to the hidden layer, if the tensor layout type of the hidden layer is different from the tensor layout type of the processing unit, the terminal device will perform an operation on the original tensor parameters in the hidden layer. Transpose so that the tensor layout type of this hidden layer is the same as the tensor layout type of the processing unit. Therefore, comparing the layout type of the input data with the tensor layout type of the hidden layer is equivalent to comparing the layout type of the input data with the tensor layout type of the processing unit.
可选的,如图4所示,输入层可以将输入数据输入至第一部分隐藏层中的第一个隐藏层。若终端设备确定该输入数据与第一部分隐藏层中的第一个隐藏层的张量布局类型不相同,则也可以在该输入层之后插入Permute算子,将该输入数据进行转置,将转置后的输入数据输入至该第一部分隐藏层中的第一个隐藏层。当第一部分隐藏层中的最后一个隐藏层处理完上一个隐藏层输入的输入数据(即上一个隐藏层的输出数据)后,则可以将最后一个隐藏层的输出数据进行转置。转置的方法为在第一部分隐藏层中的最后一个隐藏层之后添加一个Permute算子,这样就可以得到转置后的输出数据,该转置后的输出数据的布局类型与第一部分隐藏层中的第一个隐藏层的输入数据的布局类型相同。当该转置后的输出数据作为输入数据(即第二部分隐藏层中的第一个隐藏层的输入数据)输入至第二部分隐藏层中的第一个隐藏层时,终端设备需要判断第二部分隐藏层中的第一个隐藏层的输入数据的布局类型和第二部分隐藏层中的第一个隐藏层的张量布局类型是否相同。若相同,则可以直接输入第二部分隐藏层中的第一个隐藏层;若不同,则需要通过Permute算子对该第二部分隐藏层中的第一个隐藏层的输入数据进行转置。类似地,第二部分隐藏层的最后一个隐藏层与第三部分隐藏层的第一个隐藏层也可以采用同样的方法进行数据的输入或输出的。Optionally, as shown in FIG. 4 , the input layer may input input data to the first hidden layer in the first part of the hidden layers. If the terminal device determines that the input data is not the same as the tensor layout type of the first hidden layer in the first part of the hidden layer, it can also insert a Permute operator after the input layer, transpose the input data, and transfer the The post-positioned input data is input to the first hidden layer in the first partial hidden layer. After the last hidden layer in the first part of the hidden layer has processed the input data input by the previous hidden layer (ie, the output data of the previous hidden layer), the output data of the last hidden layer can be transposed. The method of transposition is to add a Permute operator after the last hidden layer in the first part of the hidden layer, so that the transposed output data can be obtained. The layout type of the transposed output data is the same as that in the first part of the hidden layer. The layout type of the input data of the first hidden layer is the same. When the transposed output data is input to the first hidden layer in the second hidden layer as input data (ie, the input data of the first hidden layer in the second hidden layer), the terminal device needs to determine the first hidden layer in the second hidden layer. Whether the layout type of the input data of the first hidden layer in the two-part hidden layer and the tensor layout type of the first hidden layer in the second-part hidden layer are the same. If they are the same, you can directly input the first hidden layer in the second hidden layer; if they are different, you need to transpose the input data of the first hidden layer in the second hidden layer through the Permute operator. Similarly, the last hidden layer of the second part of the hidden layer and the first hidden layer of the third part of the hidden layer can also use the same method to input or output data.
需要说明的是,对隐藏层的原始张量参数进行转置是直接转置的,可以不通过Permute算子来转置。It should be noted that the transpose of the original tensor parameters of the hidden layer is directly transposed, and can be transposed without the Permute operator.
操作220、通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,该目标张量参数的布局类型与目标处理单元的张量布局类型相同。Operation 220: Process the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, the layout type of the target tensor parameter is the same as that of the target processing unit. The volume layout type is the same.
目标处理单元会对对应的隐藏层中的输入数据以及隐藏层中的目标张量参数进行处理。若隐藏层的个数为多个,则目标处理单元会逐一对各个隐藏层 中的输入数据和张量参数进行处理。目标处理单元处理完成后可以得到输出数据。得到输出数据之后,可以通过Permute算子对该输出数据进行转置,得到转置后的输出数据。并将该转置后的输出数据作为输入至该目标处理单元的下一个处理单元的输入数据。这样就实现不同处理单元之间的数据流通。The target processing unit processes the input data in the corresponding hidden layer and the target tensor parameters in the hidden layer. If there are multiple hidden layers, the target processing unit will process the input data and tensor parameters in each hidden layer one by one. The output data can be obtained after the processing of the target processing unit is completed. After the output data is obtained, the output data can be transposed through the Permute operator to obtain the transposed output data. and use the transposed output data as input data to the next processing unit of the target processing unit. In this way, data flow between different processing units is realized.
通过本申请实施例,终端设备先在目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型不同的情况下,对该隐藏层的原始张量参数进行转置,得到目标张量参数,该目标张量参数的布局类型与该目标处理单元的张量布局类型相同。进而,可以对输入至目标处理单元的输入数据的布局类型和目标处理单元的张量布局类型进行对比,若确定不相同,则可以通过转置算子对该输入数据进行转置,得到转置后的输入数据,该转置后的输入数据的布局类型则与目标处理单元的张量布局类型相同。通过该方法,可以使输入数据、神经网络隐藏层的张量布局类型和执行单元的张量布局类型相互兼容,有利于处理单元处理输入数据和神经网络隐藏层中的参数。Through the embodiment of the present application, the terminal device firstly transposes the original tensor parameters of the hidden layer when the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are different, to obtain A target tensor parameter whose layout type is the same as the tensor layout type of this target processing unit. Further, the layout type of the input data input to the target processing unit can be compared with the tensor layout type of the target processing unit. If it is determined that they are not the same, the input data can be transposed by the transposition operator to obtain the transposition The layout type of the transposed input data is the same as the tensor layout type of the target processing unit. Through this method, the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit can be made compatible with each other, which is beneficial for the processing unit to process the input data and the parameters in the hidden layer of the neural network.
请参见图5,图5为本申请实施例提供的一种数据处理装置的单元示意图。图5所示的数据处理装置可以用于执行上述图2所描述的方法实施例中的部分或全部功能。该装置可以是终端设备,也可以是终端设备中的装置,或者是能够和终端设备匹配使用的装置。Please refer to FIG. 5 , which is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application. The data processing apparatus shown in FIG. 5 can be used to perform some or all of the functions in the method embodiment described in FIG. 2 above. The device may be a terminal device, or a device in the terminal device, or a device that can be used in combination with the terminal device.
该装置的逻辑结构可包括:处理子单元510、获取子单元520。当该装置被应用于终端设备时,终端设备包括应用层、神经网络模型以及至少一个处理单元,神经网络模型包括至少一个隐藏层,各个处理单元对应至少一个隐藏层中的一个或多个隐藏层,其中:The logical structure of the apparatus may include: a processing subunit 510 and an obtaining subunit 520 . When the apparatus is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer ,in:
处理子单元510,用于若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据;The processing subunit 510 is configured to transpose the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
上述处理子单元510还用于通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,目 标张量参数的布局类型与目标处理单元的张量布局类型相同。The above-mentioned processing sub-unit 510 is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target tensor parameters. The tensor layout type for processing units is the same.
在一种可能的实现方式中,通过转置算子对输入数据进行转置,得到转置后的输入数据之前,获取子单元520,用于获取目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型;上述处理子单元510还用于若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型不相同,则对目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到目标张量参数。In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the acquisition subunit 520 is used to acquire the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit; the above-mentioned processing subunit 510 is also used for if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, then the corresponding The original tensor parameters in all hidden layers are transposed to obtain the target tensor parameters.
在一种可能的实现方式中,上述处理子单元510还用于若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型相同,则将目标处理单元对应的所有隐藏层中的原始张量参数作为目标张量参数。In a possible implementation manner, the processing subunit 510 is further configured to, if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are the same, all hidden layers corresponding to the target processing unit The original tensor parameter in the layer as the target tensor parameter.
在一种可能的实现方式中,通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据之后,上述处理子单元510还用于通过转置算子对输出数据进行转置,得到转置后的输出数据;将转置后的输出数据作为输入至目标处理单元的下一个处理单元的输入数据。In a possible implementation manner, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the above-mentioned processing subunit 510 further uses The output data is transposed by a transposition operator to obtain the transposed output data; the transposed output data is used as the input data input to the next processing unit of the target processing unit.
在一种可能的实现方式中,应用层的输出为至少一个处理单元中第一处理单元的输入,第一处理单元的输出为第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为第一处理单元的下一个处理单元的输入。In a possible implementation manner, the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the first hidden layer in one or more hidden layers corresponding to the first processing unit The input of the layer, the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
请参见图6,图6为本申请实施例提供的一种数据处理装置的实体结构简化示意图,该装置包括处理器610、存储器620和通信接口630,该处理器610、存储器620以及通信接口630通过一条或多条通信总线连接。该数据处理装置可以是芯片、或芯片模组等。Please refer to FIG. 6 . FIG. 6 is a simplified schematic diagram of the physical structure of a data processing apparatus according to an embodiment of the application. The apparatus includes a processor 610 , a memory 620 and a communication interface 630 . The processor 610 , the memory 620 and the communication interface 630 Connected via one or more communication buses. The data processing device may be a chip, a chip module, or the like.
处理器610被配置为支持数据处理装置执行上述图2中方法相应的功能。应理解,本申请实施例中,所述处理器610可以为中央处理单元(central processing unit,简称CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,简称DSP)、专用集成电路(application specific integrated circuit,简称ASIC)、现成可编程门阵列(field programmable gate array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、 分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 610 is configured to support the data processing apparatus to perform functions corresponding to the method in FIG. 2 described above. It should be understood that in the embodiment of the present application, the processor 610 may be a central processing unit (central processing unit, CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) ), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
存储器620用于存储程序代码等。本申请实施例中的存储器620可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,简称ROM)、可编程只读存储器(programmable ROM,简称PROM)、可擦除可编程只读存储器(erasable PROM,简称EPROM)、电可擦除可编程只读存储器(electrically EPROM,简称EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,简称RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,简称RAM)可用,例如静态随机存取存储器(static RAM,简称SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,简称SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,简称DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,简称ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,简称SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,简称DR RAM)。The memory 620 is used to store program codes and the like. The memory 620 in this embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM for short), programmable read-only memory (PROM for short), erasable programmable read-only memory (EPROM for short) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM for short) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous Dynamic random access memory (synchronous DRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, referred to as DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, referred to as ESDRAM), Synchronous connection dynamic random access memory (synchlink DRAM, referred to as SLDRAM) and direct memory bus random access memory (direct rambus RAM, referred to as DR RAM).
通信接口630用于收发数据、信息或消息等,也可以描述为收发器、收发电路等。The communication interface 630 is used to send and receive data, information or messages, etc., and can also be described as a transceiver, a transceiver circuit, and the like.
在本申请实施例中,处理器610调用存储器620中存储的程序代码以执行以下操作:In this embodiment of the present application, the processor 610 calls the program code stored in the memory 620 to perform the following operations:
处理器610调用存储器620中存储的程序代码若输入至目标处理单元的输入数据的布局类型与目标处理单元的张量布局类型不相同,则通过转置算子对输入数据进行转置,得到转置后的输入数据,输入数据为张量,目标处理单元为至少一个处理单元中的任一处理单元,输入数据为应用层或者至少一个隐藏层中的一个隐藏层的输出数据;The processor 610 calls the program code stored in the memory 620. If the layout type of the input data input to the target processing unit is not the same as the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transpose. The set input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;
处理器610调用存储器620中存储的程序代码通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,目标张量参数的布局类型与目标处理单元的张量布局类型相同。The processor 610 calls the program code stored in the memory 620 to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain the output data, the layout of the target tensor parameters The type is the same as the tensor layout type of the target processing unit.
在一种可能的实现方式中,通过转置算子对输入数据进行转置,得到转置后的输入数据之前,处理器610调用存储器620中存储的程序代码获取目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型;若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型不相同,则对目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到目标张量参数。In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the processor 610 calls the program code stored in the memory 620 to obtain the hidden layer corresponding to the target processing unit. The tensor layout type and the tensor layout type of the target processing unit; if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit The original tensor parameters are transposed to obtain the target tensor parameters.
在一种可能的实现方式中,处理器610调用存储器620中存储的程序代码若目标处理单元对应的隐藏层的张量布局类型和目标处理单元的张量布局类型相同,则将目标处理单元对应的所有隐藏层中的原始张量参数作为目标张量参数。In a possible implementation manner, the processor 610 calls the program code stored in the memory 620 if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, then the target processing unit corresponds to the tensor layout type of the target processing unit. The original tensor parameters in all hidden layers of are used as target tensor parameters.
在一种可能的实现方式中,通过目标处理单元对转置后的输入数据和目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据之后,处理器610调用存储器620中存储的程序代码通过转置算子对输出数据进行转置,得到转置后的输出数据;将转置后的输出数据作为输入至目标处理单元的下一个处理单元的输入数据。In a possible implementation manner, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the processor 610 calls the memory 620 The stored program code transposes the output data through a transposition operator to obtain the transposed output data; the transposed output data is used as the input data to the next processing unit of the target processing unit.
在一种可能的实现方式中,应用层的输出为至少一个处理单元中第一处理单元的输入,第一处理单元的输出为第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为第一处理单元的下一个处理单元的输入。In a possible implementation manner, the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the first hidden layer in one or more hidden layers corresponding to the first processing unit The input of the layer, the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.
关于上述实施例中描述的装置、产品包含的各个模块/单元,其可以是软件模块/单元,也可以是硬件模块/单元,或者也可以部分是软件模块/单元,部分是硬件模块/单元。例如,对于应用于或集成于芯片的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于芯片模组的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于芯片模组的同一组件(例如芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片模组内部集成的处理器,剩 余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于终端的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于终端内同一组件(例如,芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于终端内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现。Regarding the modules/units included in the devices and products described in the foregoing embodiments, they may be software modules/units or hardware modules/units, or may be partly software modules/units and partly hardware modules/units. For example, for each device or product applied to or integrated in a chip, each module/unit included therein may be implemented by hardware such as circuits, or at least some of the modules/units may be implemented by a software program. Running on the processor integrated inside the chip, the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the chip module, the modules/units contained therein can be They are all implemented by hardware such as circuits, and different modules/units can be located in the same component of the chip module (such as chips, circuit modules, etc.) or in different components, or at least some of the modules/units can be implemented by software programs. The software program runs on the processor integrated inside the chip module, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the terminal, each module contained in it The units/units may all be implemented in hardware such as circuits, and different modules/units may be located in the same component (eg, chip, circuit module, etc.) or in different components in the terminal, or at least some of the modules/units may be implemented by software programs Realization, the software program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented in hardware such as circuits.
请参见图7,图7为本申请实施例提供的一种芯片简化示意图,该芯片中包括处理器710和数据接口720。该芯片可以用于处理如图2中方法相应的功能。该芯片可以包含于如图6所示的数据处理装置中。该芯片也可以包含于芯片模组中。Referring to FIG. 7 , FIG. 7 is a simplified schematic diagram of a chip provided by an embodiment of the present application, and the chip includes a processor 710 and a data interface 720 . The chip can be used to process functions corresponding to the method in FIG. 2 . The chip may be included in a data processing apparatus as shown in FIG. 6 . The chip may also be included in a chip module.
需要说明的是,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。It should be noted that, in the foregoing embodiments, the description of each embodiment has its own emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
本发明实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the method of the embodiment of the present invention may be adjusted, combined and deleted in sequence according to actual needs.
本发明实施例处理设备中的单元可以根据实际需要进行合并、划分和删减。The units in the processing device in the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、存储盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态存储盘Solid State Disk(SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g. coaxial cable, optical fiber, digital subscriber line) or wireless (eg infrared, wireless, microwave, etc.) means to another website site, computer, server or data center. A computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. Useful media may be magnetic media (eg, floppy disks, storage disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括电子设备。Embodiments of the present application further provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute part or all of the steps of any method described in the above method embodiments , the above computer includes electronic equipment.
本申请实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括电子设备。Embodiments of the present application further provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute any one of the method embodiments described above. some or all of the steps of the method. The computer program product may be a software installation package, and the computer includes an electronic device.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法、装置和系统,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的;例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式;例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included individually, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动 硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute some steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store program codes medium.
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,可轻易想到变化或替换,均可作各种更动与修改,包含上述不同功能、实施步骤的组合,包含软件和硬件的实施方式,均在本发明的保护范围。Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art, without departing from the spirit and scope of the present invention, can easily think of changes or substitutions, and can make various changes and modifications, including the combination of the above-mentioned different functions and implementation steps, including the implementation of software and hardware. The methods are all within the protection scope of the present invention.

Claims (14)

  1. 一种数据处理方法,其特征在于,所述方法应用于终端设备,所述终端设备包括应用层、神经网络模型以及至少一个处理单元,所述神经网络模型包括至少一个隐藏层,各个所述处理单元对应所述至少一个隐藏层中的一个或多个隐藏层,包括:A data processing method, characterized in that the method is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, each of the processing The unit corresponds to one or more hidden layers in the at least one hidden layer, including:
    若输入至目标处理单元的输入数据的布局类型与所述目标处理单元的张量布局类型不相同,则通过转置算子对所述输入数据进行转置,得到转置后的输入数据,所述输入数据为张量,所述目标处理单元为所述至少一个处理单元中的任一处理单元,所述输入数据为所述应用层或者所述至少一个隐藏层中的一个隐藏层的输出数据;If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transposed input data, so The input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or a hidden layer in the at least one hidden layer ;
    通过所述目标处理单元对所述转置后的输入数据和所述目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,所述目标张量参数的布局类型与所述目标处理单元的张量布局类型相同。The transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit to obtain output data. The layout type of the target tensor parameters is the same as that of the target processing unit. The tensor layout type of the target processing unit is the same.
  2. 根据权利要求1所述的方法,其特征在于,所述通过转置算子对所述输入数据进行转置,得到转置后的输入数据之前,还包括:The method according to claim 1, wherein before the transposing the input data by a transposition operator to obtain the transposed input data, the method further comprises:
    获取所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型;Obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit;
    若所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型不相同,则对所述目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到所述目标张量参数。If the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, transpose the original tensor parameters in all hidden layers corresponding to the target processing unit , to get the target tensor parameters.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    若所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型相同,则将所述目标处理单元对应的所有隐藏层中的原始张量参数作为所述目标张量参数。If the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target Tensor parameters.
  4. 根据权利要求1所述的方法,其特征在于,所述通过所述目标处理单元对所述转置后的输入数据和所述目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据之后,还包括:The method according to claim 1, wherein the target processing unit processes the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit, After getting the output data, it also includes:
    通过转置算子对所述输出数据进行转置,得到转置后的输出数据;Transpose the output data through a transposition operator to obtain the transposed output data;
    将所述转置后的输出数据作为输入至所述目标处理单元的下一个处理单 元的输入数据。The transposed output data is used as input data to the next processing unit of the target processing unit.
  5. 根据权利要求1所述的方法,其特征在于,所述应用层的输出为所述至少一个处理单元中第一处理单元的输入,所述第一处理单元的输出为所述第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,所述第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为所述第一处理单元的下一个处理单元的输入。The method according to claim 1, wherein the output of the application layer is the input of a first processing unit in the at least one processing unit, and the output of the first processing unit is the corresponding output of the first processing unit The input of the first hidden layer in the one or more hidden layers, and the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the output of the next processing unit of the first processing unit. enter.
  6. 一种数据处理装置,其特征在于,应用于终端设备,所述终端设备包括应用层、神经网络模型以及至少一个处理单元,所述神经网络模型包括至少一个隐藏层,各个所述处理单元对应所述至少一个隐藏层中的一个或多个隐藏层,所述装置包括:A data processing device, characterized in that it is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, and each of the processing units corresponds to the one or more hidden layers of the at least one hidden layer, the apparatus comprising:
    处理子单元,用于若输入至目标处理单元的输入数据的布局类型与所述目标处理单元的张量布局类型不相同,则通过转置算子对所述输入数据进行转置,得到转置后的输入数据,所述输入数据为张量,所述目标处理单元为所述至少一个处理单元中的任一处理单元,所述输入数据为所述应用层或者所述至少一个隐藏层中的一个隐藏层的输出数据;A processing subunit, configured to transpose the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain a transposition After input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the application layer or the at least one hidden layer. The output data of a hidden layer;
    所述处理子单元还用于通过所述目标处理单元对所述转置后的输入数据和所述目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据,所述目标张量参数的布局类型与所述目标处理单元的张量布局类型相同。The processing subunit is further configured to process the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the target The layout type of the tensor parameter is the same as the tensor layout type of the target processing unit.
  7. 根据权利要求6所述的数据处理装置,其特征在于,通过转置算子对输入数据进行转置,得到转置后的输入数据之前,所述获取子单元,还用于获取所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型;以及,若所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型不相同,则对所述目标处理单元对应的所有隐藏层中的原始张量参数进行转置,得到所述目标张量参数。The data processing device according to claim 6, wherein the acquisition subunit is further configured to acquire the target processing before transposing the input data through a transposition operator to obtain the transposed input data The tensor layout type of the hidden layer corresponding to the unit and the tensor layout type of the target processing unit; and, if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit If not, the original tensor parameters in all hidden layers corresponding to the target processing unit are transposed to obtain the target tensor parameters.
  8. 根据权利要求7所述的数据处理装置,其特征在于,所述处理子单元还用于:若所述目标处理单元对应的隐藏层的张量布局类型和所述目标处理单元的张量布局类型相同,则将所述目标处理单元对应的所有隐藏层中的原始张 量参数作为所述目标张量参数。The data processing apparatus according to claim 7, wherein the processing subunit is further configured to: if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit If the same, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target tensor parameters.
  9. 根据权利要求6所述的数据处理装置,其特征在于,所述通过所述目标处理单元对所述转置后的输入数据和所述目标处理单元对应的所有隐藏层中的目标张量参数进行处理,得到输出数据之后,所述处理子单元还用于:通过转置算子对所述输出数据进行转置,得到转置后的输出数据;以及将所述转置后的输出数据作为输入至所述目标处理单元的下一个处理单元的输入数据。The data processing apparatus according to claim 6, wherein the target processing unit performs processing on the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit. After the output data is obtained, the processing subunit is further used for: transposing the output data through a transposition operator to obtain the transposed output data; and using the transposed output data as input Input data to the next processing unit of the target processing unit.
  10. 根据权利要求6所述的数据处理装置,其特征在于,所述应用层的输出为所述至少一个处理单元中第一处理单元的输入,所述第一处理单元的输出为所述第一处理单元对应的一个或多个隐藏层中第一个隐藏层的输入,所述第一处理单元对应的一个或多个隐藏层中最后一个隐藏层的输出为所述第一处理单元的下一个处理单元的输入。The data processing apparatus according to claim 6, wherein the output of the application layer is the input of a first processing unit in the at least one processing unit, and the output of the first processing unit is the first processing unit The input of the first hidden layer in the one or more hidden layers corresponding to the unit, and the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the next processing of the first processing unit unit input.
  11. 一种数据处理装置,其特征在于,包括处理器、存储器和通信接口,所述处理器、所述存储器和所述通信接口相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1至5中任一项所述的数据处理方法。A data processing device, characterized in that it includes a processor, a memory, and a communication interface, wherein the processor, the memory, and the communication interface are connected to each other, wherein the memory is used to store a computer program, and the computer program Including program instructions, the processor is configured to invoke the program instructions to perform the data processing method of any one of claims 1 to 5.
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1至5中任一项所述的数据处理方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of claims 1 to 5. A method of data processing described.
  13. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至5中任一项所述的数据处理方法。A chip, characterized in that the chip includes a processor and a data interface, and the processor reads an instruction stored on a memory through the data interface, so as to execute the method according to any one of claims 1 to 5. data processing method.
  14. 一种芯片模组,其特征在于,该芯片模组包括如权利要求9所述的芯片。A chip module, characterized in that the chip module comprises the chip of claim 9 .
PCT/CN2021/141402 2021-01-28 2021-12-25 Data processing method and apparatus WO2022161060A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110116895.5A CN112862071B (en) 2021-01-28 2021-01-28 Data processing method and device
CN202110116895.5 2021-01-28

Publications (1)

Publication Number Publication Date
WO2022161060A1 true WO2022161060A1 (en) 2022-08-04

Family

ID=75987440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141402 WO2022161060A1 (en) 2021-01-28 2021-12-25 Data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN112862071B (en)
WO (1) WO2022161060A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862071B (en) * 2021-01-28 2023-04-28 展讯通信(上海)有限公司 Data processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210017A (en) * 2019-12-24 2020-05-29 北京迈格威科技有限公司 Method, device, equipment and storage medium for determining layout sequence and processing data
CN111242286A (en) * 2020-01-14 2020-06-05 Oppo广东移动通信有限公司 Data format conversion method and device and computer readable storage medium
CN111860801A (en) * 2019-04-30 2020-10-30 百度(美国)有限责任公司 Neural network method, neural network system, and computer-readable medium
CN111882038A (en) * 2020-07-24 2020-11-03 深圳力维智联技术有限公司 Model conversion method and device
US20200364047A1 (en) * 2019-05-16 2020-11-19 Facebook, Inc. High throughput neural network operations using inter-layer memory layout transformation
CN112862071A (en) * 2021-01-28 2021-05-28 展讯通信(上海)有限公司 Data processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597965B (en) * 2018-11-19 2023-04-18 深圳力维智联技术有限公司 Data processing method, system, terminal and medium based on deep neural network
US11494613B2 (en) * 2019-01-02 2022-11-08 The Aerospace Corporation Fusing output of artificial intelligence networks
CN111401537A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN111401538A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860801A (en) * 2019-04-30 2020-10-30 百度(美国)有限责任公司 Neural network method, neural network system, and computer-readable medium
US20200364047A1 (en) * 2019-05-16 2020-11-19 Facebook, Inc. High throughput neural network operations using inter-layer memory layout transformation
CN111210017A (en) * 2019-12-24 2020-05-29 北京迈格威科技有限公司 Method, device, equipment and storage medium for determining layout sequence and processing data
CN111242286A (en) * 2020-01-14 2020-06-05 Oppo广东移动通信有限公司 Data format conversion method and device and computer readable storage medium
CN111882038A (en) * 2020-07-24 2020-11-03 深圳力维智联技术有限公司 Model conversion method and device
CN112862071A (en) * 2021-01-28 2021-05-28 展讯通信(上海)有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN112862071A (en) 2021-05-28
CN112862071B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
Pequito et al. On the complexity of the constrained input selection problem for structural linear systems
Daneshmand et al. Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization
CN112037912A (en) Triage model training method, device and equipment based on medical knowledge map
EP3502975A1 (en) Methods and apparatus for model parallelism in artificial neural networks
US11355175B2 (en) Deep learning accelerator and random access memory with a camera interface
WO2022161060A1 (en) Data processing method and apparatus
CN112328674A (en) Cross-data-format model conversion acceleration method and device
CN111833984B (en) Medicine quality control analysis method, device, equipment and medium based on machine learning
TW202145079A (en) Operation execution method and device, electronic equipment and storage medium
US20220108150A1 (en) Method and apparatus for processing data, and related products
WO2021218037A1 (en) Target detection method and apparatus, computer device and storage medium
JP7044449B2 (en) Systems for facilitating quantum tomography, methods implemented by computers, and computer programs
US20160034820A1 (en) Systems and methods for distributed solution of optimization problems
Brigo et al. Dimensionality reduction for measure valued evolution equations in statistical manifolds
WO2023185541A1 (en) Model training method and related device
CN116739154A (en) Fault prediction method and related equipment thereof
CN109800286B (en) Dialog generation method and device
Qiu et al. Synchronization of multi-links memristor-based switching networks under uniform random attacks
KR102614066B1 (en) Method for psychological test-based interior recommendation
Kumar et al. A note on implementation of Raghavan–Roth solution for wrist-partitioned robots
CA3174813A1 (en) Matrix sketching using analog crossbar architectures
US20230177820A1 (en) Computing apparatus and method for performing reinforcement learning using multimodal artificial intelligence agent
CN114943274B (en) Model training method, device, storage medium, server, terminal and system
US20210298624A1 (en) Neuroanatomical tract visualization using synaptic connectivity graphs
RU2810916C2 (en) Methods and electronic devices for packaging requests intended for processing by processing unit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922650

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922650

Country of ref document: EP

Kind code of ref document: A1