WO2022161060A1

WO2022161060A1 - Data processing method and apparatus

Info

Publication number: WO2022161060A1
Application number: PCT/CN2021/141402
Authority: WO
Inventors: 罗佳; 许铭仁
Original assignee: 展讯通信（上海）有限公司
Priority date: 2021-01-28
Filing date: 2021-12-25
Publication date: 2022-08-04
Also published as: CN112862071A; CN112862071B

Abstract

Disclosed in the present application are a data processing method and apparatus. The method comprises: if the layout type of input data inputted to a target processing unit is different from the tensor layout type of the target processing unit, transposing the input data by a transposition operator to obtain transposed input data, the input data being a tensor, the target processing unit being any of at least one processing unit, and the input data being output data of an application layer or one of at least one hidden layer; and processing, by the target processing unit, the transposed input data and target tensor parameters in all the hidden layers corresponding to the target processing unit to obtain the output data, the layout type of the target tensor parameters being the same as the tensor layout type of the target processing unit. By means of the method, the input data, the tensor layout type of a neural network hidden layer, and the tensor layout type of an execution unit can be compatible with one another.

Description

Data processing method and device

technical field

The present application relates to the field of computer technology, and in particular, to a data processing method and device.

Background technique

Neural network is a mathematical model or computational model that imitates the structure and function of biological neural network in the field of machine learning and cognitive science. The data in the neural network can be stored in a tensor (Tensor), and the arrangement (Layout) of the data in the tensor can include NHWC and NCHW. The data in the neural network can be calculated by the execution unit to obtain the corresponding calculation result. However, when the arrangement of the tensor data supported by the execution unit is incompatible with the arrangement of the tensor data supported by the neural network, the correct calculation result cannot be obtained.

SUMMARY OF THE INVENTION

The present application discloses a data processing method and device, which can make the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit compatible with each other.

In a first aspect, the embodiments of the present application provide a data processing method and apparatus. The method is applied to a terminal device. The terminal device includes an application layer, a neural network model, and at least one processing unit. The neural network model includes at least one hidden layer. Each The processing unit corresponds to one or more hidden layers in the at least one hidden layer, and the method includes:

If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transpose the input data through the transpose operator to obtain the transposed input data, and the input data is a tensor , the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;

The transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit to obtain the output data. The layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit. .

In one embodiment, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are obtained; If the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain the target tensor parameters .

In one embodiment, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target tensor. parameter.

In one embodiment, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the output data is transformed by the transposition operator. set to obtain the transposed output data; use the transposed output data as the input data to the next processing unit of the target processing unit.

In one embodiment, the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the input of the first hidden layer of the one or more hidden layers corresponding to the first processing unit. , the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.

In a second aspect, an embodiment of the present application provides a data processing apparatus, which is applied to a terminal device. The terminal device includes an application layer, a neural network model, and at least one processing unit. The neural network model includes at least one hidden layer, and each processing unit corresponds to at least one processing unit. One or more of a hidden layer, the apparatus includes:

The processing subunit is used to transpose the input data through the transpose operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data , the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or a hidden layer in the at least one hidden layer;

The processing sub-unit is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target processing unit. The tensor layout type is the same.

In a third aspect, an embodiment of the present application provides a data processing device, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions, The processor is configured to invoke program instructions to execute the data processing method as described in the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded and executed by a processor as described in Section 1. A data processing method is described on the one hand.

In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to execute the data processing method described in the first aspect.

In a sixth aspect, an embodiment of the present application provides a chip module, where the chip module includes the chip of the fifth aspect.

In the embodiment of the present application, if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transposed input data , the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer; The input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed to obtain output data. The layout type of the target tensor parameters is the same as that of the target processing unit. Through this method, the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit can be made compatible with each other.

Description of drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

Fig. 1a is a system architecture diagram including a hidden layer of a neural network provided by an embodiment of the application;

FIG. 1b is another system architecture diagram including a neural network hidden layer provided by an embodiment of the application;

2 is a schematic flowchart of a data processing method provided by an embodiment of the present application;

3 is a schematic diagram of all hidden layers being processed only on one processing unit according to an embodiment of the present application;

4 is a schematic diagram of a hidden layer being processed on multiple processing units according to an embodiment of the present application;

5 is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application;

6 is a simplified schematic diagram of a physical structure of a data processing apparatus provided by an embodiment of the present application;

FIG. 7 is a simplified schematic diagram of a chip of a data processing apparatus provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

In order to better understand the embodiments of the present application, the technical terms involved in the embodiments of the present application are introduced below:

Artificial Intelligence (AI): It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Tensor: is a multilinear function that can be used to represent linear relationships between some vectors, scalars, and other tensors. Tensors are a branch of mathematics with important applications in mechanics. The term tensor originated in mechanics. It was originally used to represent the stress state at various points in an elastic medium. Later, tensor theory developed into a powerful mathematical tool in mechanics and physics. The tensor concept is a generalization of the vector concept, which is a first-order tensor. Tensors are based on the generalization of vectors and matrices. For example, scalars can be regarded as zero-order tensors, vectors can be regarded as first-order tensors, and matrices can be regarded as second-order tensors. In addition, the arrangement of tensors in memory (Layout, or memory layout) can include NHWC, NCHW, CHWN, etc. Among them, NHWC indicates that tensors are arranged in memory as [batch, height, width, channels], and NCHW indicates that tensors are arranged in memory as [batch, channels, height, width]. Among them, batch represents the number of data input at one time, channels represents the number of channels, width represents the width, and height represents the height. In the embodiment of the present application, the tensor of NCHW can be obtained by transposing the tensor of NHWC, and similarly, the tensor of NHWC can be obtained by transposing the tensor of NCHW.

Neural network: In computer science, it usually refers to artificial neural network, which is a mathematical model for information processing using a structure similar to the synaptic connection of the brain. In engineering and academia, it is often simply referred to as "neural network" or neural-like network. It is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnected relationship between a large number of internal nodes.

Operator: A mapping from a function space to a function space. Common operators include differential operator, gradient operator, divergence operator, Laplace operator, Hamiltonian operator, etc. The operator in the narrow sense actually refers to the mapping from one function space to another function space (or itself). The definition of a generalized operator can be a vector space as long as the above space is extended to a general space. Normed vector space, inner product space, or even further, Banach space, Hilbert space can be. Operators can also be divided into bounded and unbounded, linear and nonlinear, and so on.

Transpose (Permute): Intuitively, all elements of matrix A are mirror-inverted around a ray of 45 degrees to the lower right starting from the elements in the first row and the first column, that is, the transposition of A is obtained. A matrix M, turn its first row into the first column, the second row into the second column, ..., the last row into the last column, thus obtaining a new matrix N. This process is called transpose of the matrix. That is, the rows and columns of matrix A are correspondingly interchanged.

Graphics Processing Unit (GPU): Also known as display core, visual processor, and display chip, it is a kind of graphics processor specially designed for personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.). A microprocessor for image and graphics related operations.

The Central Processing Unit (CPU), as the computing and control core of the computer system, is the final execution unit for information processing and program operation.

Embedded neural network processor (Neural-network Processing Units, NPU): using the "data-driven parallel computing" architecture, it is especially good at processing massive multimedia data such as videos and images.

In order to better understand the embodiments of the present application, a network architecture applicable to the embodiments of the present application is described below.

Please refer to FIG. 1a. FIG. 1a is a system architecture diagram including a neural network model provided by an embodiment of the present application. As shown in Figure 1a, the system architecture includes an application layer, an input layer (Input Layer), a neural network hidden layer (Hidden Layer), a processing unit and an output layer (Output Layer), and the neural network model includes an input layer, a neural network hidden layer and output layer. Among them, the output of the application layer can be used as the input of the input layer of the neural network model, and the hidden layer acts on the processing unit, and the processing unit can process the data in the hidden layer. Among them, the data refers to the input data of the hidden layer and the parameters and operators carried by the hidden layer itself. It should be noted that the hidden layer of the neural network in Fig. 1a may include one hidden layer, or may include multiple hidden layers. If multiple hidden layers are included, then for a hidden layer inside multiple hidden layers, its input is the output of the previous hidden layer, and its output is the input of the next hidden layer.

Figure 1b shows another system architecture diagram including a neural network model. The system architecture includes an application layer, an input layer (Input Layer), the first part of the neural network hidden layer (Hidden Layer), the first processing unit, the second part of the hidden layer, the second processing unit, the third part of the hidden layer, the first part of the hidden layer Three processing units and output layer (Output Layer), the neural network model includes the input layer, the first part of the neural network hidden layer, the second part of the hidden layer, the third part of the hidden layer and the output layer. Wherein, the first partial hidden layer acts on the first processing unit, the second partial hidden layer acts on the second processing unit, and the third partial hidden layer acts on the third processing unit. The first processing unit, the second processing unit and the third processing unit may be processing units such as GPU, CPU or NPU, respectively. When the computational model of the neural network needs to be processed by different processing units, the hidden layers in the model can be grouped, and the hidden layers of different groups can be processed by different processing units. It should be noted that there are also multiple input layers and output layers between the first part of the hidden layer, the second part of the hidden layer and the third part of the hidden layer. That is, for a hidden layer inside multiple hidden layers, its input is the output of the previous hidden layer, and its output is the input of the next hidden layer.

In order to make the input data, the tensor layout type of the neural network and the tensor layout type of the execution unit compatible with each other, the embodiments of the present application provide a data processing method and device. The device is described in detail.

Please refer to FIG. 2. FIG. 2 provides a schematic flowchart of a data processing method according to an embodiment of the present application. The data processing method includes the following operations 210 to 220 . The method execution subject shown in FIG. 2 may be a terminal device, or the subject may be a chip in the terminal device. The terminal device to which the method is applied may include an application layer, a neural network model, and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer. When the terminal device executes the process shown in Figure 2, the following steps may be included:

Operation 210: If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transpose the input data through a transposition operator to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer of the at least one hidden layer.

The output of the application layer is the input of the first processing unit in at least one processing unit, the output of the first processing unit is the input of the first hidden layer in the one or more hidden layers corresponding to the first processing unit, and the first processing unit The output of the last hidden layer in the one or more hidden layers corresponding to the unit is the input of the next processing unit of the first processing unit.

In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the terminal device can obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the target processing unit The tensor layout type. If the terminal device determines that the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, it needs to transpose the original tensor parameters in all hidden layers corresponding to the target processing unit to obtain The above target tensor parameters. The target tensor parameter is a set of transposed tensor parameters of each hidden layer in all hidden layers, and the transposed tensor parameters of each hidden layer may be different from each other.

Optionally, as shown in FIG. 3 , it is a schematic diagram in which all hidden layers are processed on only one processing unit. In Figure 3, the hidden layer of the neural network is processed by only one processing unit, which may be a GPU, CPU, NPU, or other processing units that can process the hidden layer. If all hidden layers are processed on only one processing unit, the terminal device only needs to determine whether the tensor layout type of all hidden layers corresponding to the processing unit is the same as the tensor layout type of the processing unit. For example, if the tensor layout type of all hidden layers corresponding to this processing unit is NHWC, and the tensor layout type of this processing unit is NCHW, the terminal device needs to transpose the original tensor parameters in all hidden layers to obtain the target Tensor parameters. The layout type of the target tensor parameter is the same as the tensor layout type of the processing unit, both of which are NHWC. The original tensor parameter refers to the parameter in the operator of the hidden layer, such as the Weights parameter of the Conv2d operator. Convert from NHWC to NCHW to fit the layout constraints of the target cell pair.

Optionally, as shown in FIG. 4 , a schematic diagram of a hidden layer being processed on multiple processing units is shown. In Figure 4, all hidden layers in the neural network model are divided into the first part of the hidden layer, the second part of the hidden layer and the third part of the hidden layer. The first part of the hidden layer is processed by the first processing unit, the second part of the hidden layer is processed by the second processing unit, and the third part of the hidden layer is processed by the point processing unit. The tensor layout types of the first hidden layer, the second hidden layer, and the third hidden layer are all the same, and their tensor layout types are determined when the model is built. Among the three processing units, there may be processors whose tensor layout type is not the same as the tensor layout type of the hidden layer. For example, the tensor layout type of the hidden layer is NHWC, the tensor layout type of the first processing unit is NCHW, the tensor layout type of the second processing unit is NHWC, and the third tensor layout type is NCHW. Then the terminal device can determine that the tensor layout type of the first part of the hidden layer is different from the tensor layout type of the first processing unit; the tensor layout type of the second part of the hidden layer is the same as the tensor layout type of the second processing unit; The tensor layout type of the third partial hidden layer is not the same as the tensor layout type of the third processing unit. Therefore, it is necessary to transpose the original tensor parameters of all hidden layers in the first hidden layer, and transpose the original tensor parameters of all hidden layers in the third hidden layer. In this way, the tensor layout type of the first part of the hidden layer can be the same as the tensor layout type of the first processing unit, and the tensor layout type of the third part of the hidden layer can be the same as the tensor layout type of the third processing unit.

In a possible implementation, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as Target tensor parameter.

In the case where the hidden layer tensor layout type is the same as the tensor layout type of the corresponding processing unit, if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, you need to The input data is transposed, where the input data is a tensor. Specifically, the input data can be transposed by the method of the transposition operator to obtain the transposed input data. The layout type of the transposed input data is the same as the tensor layout type of the target processing unit.

Optionally, as shown in FIG. 3 , the input layer may input the input data into the hidden layer corresponding to the processing unit. When the terminal device determines that the layout type of the input data is different from the tensor layout type of the hidden layer, a transpose (Permute) operator is inserted after the input layer, and the input data is transposed by the operator. It should be noted that, before inputting the above-mentioned input data to the hidden layer, if the tensor layout type of the hidden layer is different from the tensor layout type of the processing unit, the terminal device will perform an operation on the original tensor parameters in the hidden layer. Transpose so that the tensor layout type of this hidden layer is the same as the tensor layout type of the processing unit. Therefore, comparing the layout type of the input data with the tensor layout type of the hidden layer is equivalent to comparing the layout type of the input data with the tensor layout type of the processing unit.

Optionally, as shown in FIG. 4 , the input layer may input input data to the first hidden layer in the first part of the hidden layers. If the terminal device determines that the input data is not the same as the tensor layout type of the first hidden layer in the first part of the hidden layer, it can also insert a Permute operator after the input layer, transpose the input data, and transfer the The post-positioned input data is input to the first hidden layer in the first partial hidden layer. After the last hidden layer in the first part of the hidden layer has processed the input data input by the previous hidden layer (ie, the output data of the previous hidden layer), the output data of the last hidden layer can be transposed. The method of transposition is to add a Permute operator after the last hidden layer in the first part of the hidden layer, so that the transposed output data can be obtained. The layout type of the transposed output data is the same as that in the first part of the hidden layer. The layout type of the input data of the first hidden layer is the same. When the transposed output data is input to the first hidden layer in the second hidden layer as input data (ie, the input data of the first hidden layer in the second hidden layer), the terminal device needs to determine the first hidden layer in the second hidden layer. Whether the layout type of the input data of the first hidden layer in the two-part hidden layer and the tensor layout type of the first hidden layer in the second-part hidden layer are the same. If they are the same, you can directly input the first hidden layer in the second hidden layer; if they are different, you need to transpose the input data of the first hidden layer in the second hidden layer through the Permute operator. Similarly, the last hidden layer of the second part of the hidden layer and the first hidden layer of the third part of the hidden layer can also use the same method to input or output data.

It should be noted that the transpose of the original tensor parameters of the hidden layer is directly transposed, and can be transposed without the Permute operator.

Operation 220: Process the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, the layout type of the target tensor parameter is the same as that of the target processing unit. The volume layout type is the same.

The target processing unit processes the input data in the corresponding hidden layer and the target tensor parameters in the hidden layer. If there are multiple hidden layers, the target processing unit will process the input data and tensor parameters in each hidden layer one by one. The output data can be obtained after the processing of the target processing unit is completed. After the output data is obtained, the output data can be transposed through the Permute operator to obtain the transposed output data. and use the transposed output data as input data to the next processing unit of the target processing unit. In this way, data flow between different processing units is realized.

Through the embodiment of the present application, the terminal device firstly transposes the original tensor parameters of the hidden layer when the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are different, to obtain A target tensor parameter whose layout type is the same as the tensor layout type of this target processing unit. Further, the layout type of the input data input to the target processing unit can be compared with the tensor layout type of the target processing unit. If it is determined that they are not the same, the input data can be transposed by the transposition operator to obtain the transposition The layout type of the transposed input data is the same as the tensor layout type of the target processing unit. Through this method, the input data, the tensor layout type of the hidden layer of the neural network and the tensor layout type of the execution unit can be made compatible with each other, which is beneficial for the processing unit to process the input data and the parameters in the hidden layer of the neural network.

Please refer to FIG. 5 , which is a schematic diagram of a unit of a data processing apparatus provided by an embodiment of the present application. The data processing apparatus shown in FIG. 5 can be used to perform some or all of the functions in the method embodiment described in FIG. 2 above. The device may be a terminal device, or a device in the terminal device, or a device that can be used in combination with the terminal device.

The logical structure of the apparatus may include: a processing subunit 510 and an obtaining subunit 520 . When the apparatus is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer ,in:

The processing subunit 510 is configured to transpose the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain the transposed input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;

The above-mentioned processing sub-unit 510 is also used to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the layout type of the target tensor parameters and the target tensor parameters. The tensor layout type for processing units is the same.

In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the acquisition subunit 520 is used to acquire the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit; the above-mentioned processing subunit 510 is also used for if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, then the corresponding The original tensor parameters in all hidden layers are transposed to obtain the target tensor parameters.

In a possible implementation manner, the processing subunit 510 is further configured to, if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are the same, all hidden layers corresponding to the target processing unit The original tensor parameter in the layer as the target tensor parameter.

In a possible implementation manner, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the above-mentioned processing subunit 510 further uses The output data is transposed by a transposition operator to obtain the transposed output data; the transposed output data is used as the input data input to the next processing unit of the target processing unit.

In a possible implementation manner, the output of the application layer is the input of the first processing unit in the at least one processing unit, and the output of the first processing unit is the first hidden layer in one or more hidden layers corresponding to the first processing unit The input of the layer, the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the input of the next processing unit of the first processing unit.

Please refer to FIG. 6 . FIG. 6 is a simplified schematic diagram of the physical structure of a data processing apparatus according to an embodiment of the application. The apparatus includes a processor 610 , a memory 620 and a communication interface 630 . The processor 610 , the memory 620 and the communication interface 630 Connected via one or more communication buses. The data processing device may be a chip, a chip module, or the like.

The processor 610 is configured to support the data processing apparatus to perform functions corresponding to the method in FIG. 2 described above. It should be understood that in the embodiment of the present application, the processor 610 may be a central processing unit (central processing unit, CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) ), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 620 is used to store program codes and the like. The memory 620 in this embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM for short), programmable read-only memory (PROM for short), erasable programmable read-only memory (EPROM for short) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM for short) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous Dynamic random access memory (synchronous DRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, referred to as DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, referred to as ESDRAM), Synchronous connection dynamic random access memory (synchlink DRAM, referred to as SLDRAM) and direct memory bus random access memory (direct rambus RAM, referred to as DR RAM).

The communication interface 630 is used to send and receive data, information or messages, etc., and can also be described as a transceiver, a transceiver circuit, and the like.

In this embodiment of the present application, the processor 610 calls the program code stored in the memory 620 to perform the following operations:

The processor 610 calls the program code stored in the memory 620. If the layout type of the input data input to the target processing unit is not the same as the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transpose. The set input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or one hidden layer in the at least one hidden layer;

The processor 610 calls the program code stored in the memory 620 to process the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain the output data, the layout of the target tensor parameters The type is the same as the tensor layout type of the target processing unit.

In a possible implementation manner, the input data is transposed by a transposition operator, and before the transposed input data is obtained, the processor 610 calls the program code stored in the memory 620 to obtain the hidden layer corresponding to the target processing unit. The tensor layout type and the tensor layout type of the target processing unit; if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit The original tensor parameters are transposed to obtain the target tensor parameters.

In a possible implementation manner, the processor 610 calls the program code stored in the memory 620 if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, then the target processing unit corresponds to the tensor layout type of the target processing unit. The original tensor parameters in all hidden layers of are used as target tensor parameters.

In a possible implementation manner, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit, and after the output data is obtained, the processor 610 calls the memory 620 The stored program code transposes the output data through a transposition operator to obtain the transposed output data; the transposed output data is used as the input data to the next processing unit of the target processing unit.

Regarding the modules/units included in the devices and products described in the foregoing embodiments, they may be software modules/units or hardware modules/units, or may be partly software modules/units and partly hardware modules/units. For example, for each device or product applied to or integrated in a chip, each module/unit included therein may be implemented by hardware such as circuits, or at least some of the modules/units may be implemented by a software program. Running on the processor integrated inside the chip, the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the chip module, the modules/units contained therein can be They are all implemented by hardware such as circuits, and different modules/units can be located in the same component of the chip module (such as chips, circuit modules, etc.) or in different components, or at least some of the modules/units can be implemented by software programs. The software program runs on the processor integrated inside the chip module, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the terminal, each module contained in it The units/units may all be implemented in hardware such as circuits, and different modules/units may be located in the same component (eg, chip, circuit module, etc.) or in different components in the terminal, or at least some of the modules/units may be implemented by software programs Realization, the software program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented in hardware such as circuits.

Referring to FIG. 7 , FIG. 7 is a simplified schematic diagram of a chip provided by an embodiment of the present application, and the chip includes a processor 710 and a data interface 720 . The chip can be used to process functions corresponding to the method in FIG. 2 . The chip may be included in a data processing apparatus as shown in FIG. 6 . The chip may also be included in a chip module.

It should be noted that, in the foregoing embodiments, the description of each embodiment has its own emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

The steps in the method of the embodiment of the present invention may be adjusted, combined and deleted in sequence according to actual needs.

The units in the processing device in the embodiment of the present invention may be combined, divided, and deleted according to actual needs.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g. coaxial cable, optical fiber, digital subscriber line) or wireless (eg infrared, wireless, microwave, etc.) means to another website site, computer, server or data center. A computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. Useful media may be magnetic media (eg, floppy disks, storage disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.

Embodiments of the present application further provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute part or all of the steps of any method described in the above method embodiments , the above computer includes electronic equipment.

Embodiments of the present application further provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute any one of the method embodiments described above. some or all of the steps of the method. The computer program product may be a software installation package, and the computer includes an electronic device.

It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

In the several embodiments provided in this application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included individually, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.

The above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute some steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store program codes medium.

Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art, without departing from the spirit and scope of the present invention, can easily think of changes or substitutions, and can make various changes and modifications, including the combination of the above-mentioned different functions and implementation steps, including the implementation of software and hardware. The methods are all within the protection scope of the present invention.

Claims

A data processing method, characterized in that the method is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, each of the processing The unit corresponds to one or more hidden layers in the at least one hidden layer, including:

If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data is transposed by the transposition operator to obtain the transposed input data, so The input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the output data of the application layer or a hidden layer in the at least one hidden layer ;

The transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit are processed by the target processing unit to obtain output data. The layout type of the target tensor parameters is the same as that of the target processing unit. The tensor layout type of the target processing unit is the same.
The method according to claim 1, wherein before the transposing the input data by a transposition operator to obtain the transposed input data, the method further comprises:

Obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit;

If the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, transpose the original tensor parameters in all hidden layers corresponding to the target processing unit , to get the target tensor parameters.
The method according to claim 2, wherein the method further comprises:

If the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target Tensor parameters.
The method according to claim 1, wherein the target processing unit processes the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit, After getting the output data, it also includes:

Transpose the output data through a transposition operator to obtain the transposed output data;

The transposed output data is used as input data to the next processing unit of the target processing unit.
The method according to claim 1, wherein the output of the application layer is the input of a first processing unit in the at least one processing unit, and the output of the first processing unit is the corresponding output of the first processing unit The input of the first hidden layer in the one or more hidden layers, and the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the output of the next processing unit of the first processing unit. enter.
A data processing device, characterized in that it is applied to a terminal device, the terminal device includes an application layer, a neural network model and at least one processing unit, the neural network model includes at least one hidden layer, and each of the processing units corresponds to the one or more hidden layers of the at least one hidden layer, the apparatus comprising:

A processing subunit, configured to transpose the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit to obtain a transposition After input data, the input data is a tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is the application layer or the at least one hidden layer. The output data of a hidden layer;

The processing subunit is further configured to process the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit through the target processing unit to obtain output data, the target The layout type of the tensor parameter is the same as the tensor layout type of the target processing unit.
The data processing device according to claim 6, wherein the acquisition subunit is further configured to acquire the target processing before transposing the input data through a transposition operator to obtain the transposed input data The tensor layout type of the hidden layer corresponding to the unit and the tensor layout type of the target processing unit; and, if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit If not, the original tensor parameters in all hidden layers corresponding to the target processing unit are transposed to obtain the target tensor parameters.
The data processing apparatus according to claim 7, wherein the processing subunit is further configured to: if the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit If the same, the original tensor parameters in all hidden layers corresponding to the target processing unit are used as the target tensor parameters.
The data processing apparatus according to claim 6, wherein the target processing unit performs processing on the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit. After the output data is obtained, the processing subunit is further used for: transposing the output data through a transposition operator to obtain the transposed output data; and using the transposed output data as input Input data to the next processing unit of the target processing unit.
The data processing apparatus according to claim 6, wherein the output of the application layer is the input of a first processing unit in the at least one processing unit, and the output of the first processing unit is the first processing unit The input of the first hidden layer in the one or more hidden layers corresponding to the unit, and the output of the last hidden layer in the one or more hidden layers corresponding to the first processing unit is the next processing of the first processing unit unit input.
A data processing device, characterized in that it includes a processor, a memory, and a communication interface, wherein the processor, the memory, and the communication interface are connected to each other, wherein the memory is used to store a computer program, and the computer program Including program instructions, the processor is configured to invoke the program instructions to perform the data processing method of any one of claims 1 to 5.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of claims 1 to 5. A method of data processing described.
A chip, characterized in that the chip includes a processor and a data interface, and the processor reads an instruction stored on a memory through the data interface, so as to execute the method according to any one of claims 1 to 5. data processing method.
A chip module, characterized in that the chip module comprises the chip of claim 9 .