CN112862071B

CN112862071B - Data processing method and device

Info

Publication number: CN112862071B
Application number: CN202110116895.5A
Authority: CN
Inventors: 罗佳; 许铭仁
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2023-04-28
Anticipated expiration: 2041-01-28
Also published as: WO2022161060A1; CN112862071A

Abstract

The application discloses a data processing method and device, wherein the method comprises the following steps: if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transposing the input data through a transposition operator to obtain transposed input data, wherein the input data is tensor, the target processing unit is any one of the at least one processing unit, and the input data is output data of an application layer or one of the at least one hidden layers; and processing the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, wherein the layout type of the target tensor parameters is the same as that of the target processing unit. By the method, the input data, the tensor layout type of the neural network hidden layer and the tensor layout type of the execution unit are compatible with each other.

Description

Data processing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus.

Background

Neural networks are a mathematical or computational model that mimics the structure and function of biological neural networks in the field of machine learning and cognitive sciences. The data in the neural network may be stored in a Tensor (Tensor), and the arrangement (Layout) of the data in the Tensor may include both NHWC and NCHW. The data in the neural network can be calculated by the execution unit, so that a corresponding calculation result is obtained. However, when the arrangement of the tensor data supported by the execution unit is not compatible with the arrangement of the tensor data supported by the neural network, a correct calculation result cannot be obtained.

Disclosure of Invention

The application discloses a data processing method and device, which can enable input data, tensor layout types of a neural network hidden layer and tensor layout types of an execution unit to be compatible with each other.

In a first aspect, an embodiment of the present application provides a data processing method and apparatus, where the method is applied to a terminal device, where the terminal device includes an application layer, a neural network layer, and at least one processing unit, the neural network layer includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer, and the method includes:

if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transposing the input data through a transposition operator to obtain transposed input data, wherein the input data is tensor, the target processing unit is any one of the at least one processing unit, and the input data is output data of an application layer or one of the at least one hidden layers;

and processing the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, wherein the layout type of the target tensor parameters is the same as that of the target processing unit.

In one embodiment, the transposed input data is transposed by a transposition operator, and before the transposed input data is obtained, the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit are obtained; if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit are transposed to obtain the target tensor parameters.

In an embodiment, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit are used as the target tensor parameters.

In one embodiment, the target processing unit processes the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit to obtain output data, and then transposes the output data by using a transposition operator to obtain transposed output data; the transposed output data is used as input data to the next processing unit of the target processing unit.

In an embodiment, the output of the application layer is an input of a first processing unit of the at least one processing unit, the output of the first processing unit is an input of a first hidden layer of the one or more hidden layers corresponding to the first processing unit, and the output of a last hidden layer of the one or more hidden layers corresponding to the first processing unit is an input of a next processing unit of the first processing unit.

In a second aspect, an embodiment of the present application provides a data processing apparatus, applied to a terminal device, where the terminal device includes an application layer, a neural network layer, and at least one processing unit, the neural network layer includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer, where the apparatus includes:

the processing unit is used for transposing the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, so that transposed input data is obtained, the input data is tensor, the target processing unit is any one of the at least one processing unit, and the input data is output data of an application layer or one of the at least one hidden layers;

the processing unit is further configured to process, by using the target processing unit, the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit, to obtain output data, where a layout type of the target tensor parameters is the same as a tensor layout type of the target processing unit.

In a third aspect, an embodiment of the present application provides a data processing apparatus, including a processor, a memory and a communication interface, where the processor, the memory and the communication interface are connected to each other, and where the memory is configured to store a computer program, the computer program including program instructions, and the processor is configured to invoke the program instructions to perform a data processing method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium, wherein the computer readable storage medium stores one or more instructions adapted to be loaded by a processor and to perform a data processing method as described in the first aspect.

In a fifth aspect, embodiments of the present application provide a chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform a data processing method as described in the first aspect.

In a sixth aspect, embodiments of the present application provide a chip module, which includes a chip as in the fifth aspect.

In this embodiment of the present application, if a layout type of input data input to a target processing unit is different from a tensor layout type of the target processing unit, transpose the input data by a transposition operator to obtain transposed input data, where the input data is a tensor, and the target processing unit is any one of at least one processing unit, and the input data is output data of an application layer or one of at least one hidden layer; and processing the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, wherein the layout type of the target tensor parameters is the same as that of the target processing unit. By the method, the input data, the tensor layout type of the neural network hidden layer and the tensor layout type of the execution unit are compatible with each other.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic diagram of a system architecture including a neural network hidden layer according to an embodiment of the present application;

FIG. 1b is a schematic diagram of another system architecture including a neural network hidden layer according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a hidden layer provided in an embodiment of the present application, where all hidden layers are processed on only one processing unit;

FIG. 4 is a schematic diagram of a hidden layer processed on multiple processing units according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a unit of a data processing apparatus according to an embodiment of the present application;

FIG. 6 is a simplified schematic diagram of a physical structure of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a simplified schematic diagram of a chip of a data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In order to better understand the embodiments of the present application, the following description refers to the technical terms related to the embodiments of the present application:

artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Tensor (Tensor): is a multi-linear function that can be used to represent the linear relationship between some vectors, scalar quantities and other tensors. Tensors are a branch of mathematics and have important applications in mechanics. Tensor, the term "tensor" originates from mechanics and is initially used to represent the stress state of points in an elastic medium, after which tensor theory has evolved into a powerful mathematical tool for mechanics and physics. Tensor concepts are generalizations of vector concepts, with vectors being first order tensors. Tensors are generalizations based on vectors and matrices, e.g., a scalar may be considered a zero-order tensor, a vector as a first-order tensor, and a matrix as a second-order tensor. In addition, the tensor arrangement in the memory (Layout, or memory Layout) may include NHWC, NCHW, CHWN, etc. Where NHWC represents the tensor in memory arrangement [ batch, height, width, channels ], and NCHW represents the tensor in memory arrangement [ batch, channels, height, width ]. Wherein, batch represents the data number of once input, channels represents the channel number, width represents the width, height represents the height. In the embodiment of the application, the tensor of NHWC is transposed to obtain the tensor of NCHW, and similarly, the tensor of NCHW is transposed to obtain the tensor of NHWC.

Neural network: artificial neural networks, commonly referred to in computer science, are mathematical models that employ structures similar to brain nerve synapses for information processing. Also commonly referred to in the engineering and academia simply as "neural networks" or neural-like networks. The method is an algorithm mathematical model which simulates the behavior characteristics of an animal neural network and performs distributed parallel information processing. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.

Operators: a mapping of function space to function space. Common operators are differential operators, gradient operators, divergence operators, laplace operators, hamiltonian operators, and the like. A narrow operator actually refers to a mapping from one function space to another (or itself). The definition of the generalized operator may be vector space as long as the above space is generalized to general space. Fan Xiangliang space, inner product space, or further, banach space, hilbert space may be provided. Operators can also be categorized into bounded and unbounded, linear and nonlinear, and so on.

Transpose (Permute): intuitively, all elements of matrix A are mirror inverted around a ray 45 degrees from the bottom right of the 1 st row and 1 st column elements to obtain the transpose of A. A matrix M has its first row changed to the first column and its second row changed to the second column. This process is called transpose of the matrix. I.e. the rows and columns of matrix a are correspondingly interchanged.

Graphics processor (Graphics Processing Unit, GPU): the display core, the vision processor and the display chip are also called as a microprocessor which is specially used for performing image and graph related operation on personal computers, workstations, game machines and some mobile devices (such as tablet computers, smart phones and the like).

The central processing unit (Central Processing Unit, CPU) is used as the operation and control core of the computer system and is the final execution unit for information processing and program running.

Embedded Neural network processor (Neural-network Processing Units, NPU): the architecture of data-driven parallel computing is adopted, so that the method is particularly good at processing massive multimedia data such as videos and images.

In order to better understand the embodiments of the present application, a network architecture to which the embodiments of the present application are applicable is described below.

Referring to fig. 1a, fig. 1a is a system architecture diagram including a neural network hidden layer according to an embodiment of the present application. As shown in fig. 1a, the system architecture includes an application Layer, an Input Layer (Input Layer), a neural network Hidden Layer (Hidden Layer), a processing unit, and an Output Layer (Output Layer). The output of the application layer can be used as the input of the hidden layer, the hidden layer acts on the processing unit, and the processing unit can process the data in the hidden layer. The data refers to input data of the hidden layer, parameters carried by the hidden layer, operators and the like. It should be noted that the neural network hidden layer in fig. 1a may include one hidden layer or may include multiple hidden layers. If multiple hidden layers are included, then for a hidden layer that is internal to the multiple hidden layers, its input is the output of the previous hidden layer and its output is the input of the next hidden layer.

Another system architecture diagram including a neural network hidden layer is shown in fig. 1 b. The system architecture comprises an application Layer, an Input Layer (Input Layer), a first part of neural network hiding Layer (Hidden Layer), a first processing unit, a second part of hiding Layer, a second processing unit, a third part of hiding Layer, a third processing unit and an Output Layer (Output Layer). The first part of the hidden layer acts on the first processing unit, the second part of the hidden layer acts on the second processing unit, and the third part of the hidden layer acts on the third processing unit. The first processing unit, the second processing unit and the third processing unit may be processing units such as GPU, CPU or NPU, respectively. When the computational model of the neural network needs to be processed using different processing units, the model may group hidden layers therein, and different groups of hidden layers may be processed by different processing units. The first partial hidden layer, the second partial hidden layer and the third partial hidden layer are also provided with a plurality of input layers and output layers. I.e. for a hidden layer that is inside a plurality of hidden layers its input is the output of the previous hidden layer and its output is the input of the next hidden layer.

In order to enable the input data, the tensor layout type of the neural network and the tensor layout type of the execution unit to be compatible with each other, the embodiment of the application provides a data processing method and device, and the data processing method and device provided by the embodiment of the application are further described in detail below.

Referring to fig. 2, fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application. The data processing method includes operations 210 through 220 as follows. The method execution body shown in fig. 2 may be a terminal device, or the body may be a chip in the terminal device. The terminal device to which the method is applied may include an application layer, a neural network layer, and at least one processing unit, where the neural network layer includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer. When the terminal device performs the flow as shown in fig. 2, the following steps may be included:

210. if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data is transposed through a transposition operator to obtain transposed input data, the input data is tensor, the target processing unit is any processing unit in at least one processing unit, and the input data is output data of one hidden layer in the application layer or the at least one hidden layer.

The output of the application layer is the input of a first processing unit in at least one processing unit, the output of the first processing unit is the input of a first hidden layer in one or more hidden layers corresponding to the first processing unit, and the output of a last hidden layer in one or more hidden layers corresponding to the first processing unit is the input of a next processing unit of the first processing unit.

In one possible implementation manner, before the input data is transposed by the transpose operator to obtain the transposed input data, the terminal device may obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit. If the terminal equipment determines that the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit need to be transposed to obtain the target tensor parameters. The target tensor parameter is a set of tensor parameters transposed for each hidden layer in all hidden layers, and the tensor parameters transposed for each hidden layer may be different from each other.

Alternatively, a schematic diagram is shown in fig. 3, in which all hidden layers are processed on only one processing unit. In fig. 3, the neural network hidden layer is processed by only one processing unit, which may be GPU, CPU, NPU or other processing unit that may process the hidden layer. If all hidden layers are processed on only one processing unit, the terminal device only needs to determine whether the tensor layout type of all hidden layers corresponding to the processing unit is the same as the tensor layout type of the processing unit. For example, if the tensor layout type of all the hidden layers corresponding to the processing unit is NHWC and the tensor layout type of the processing unit is NCHW, the terminal device needs to transpose the original tensor parameters in all the hidden layers to obtain the target tensor parameters. The layout type of the target tensor parameter is consistent with the tensor layout type of the processing unit, and is NHWC.

Alternatively, a schematic diagram of a hidden layer being processed on multiple processing units is shown in fig. 4. In fig. 4, all hidden layers in the neural network model are divided into a first partial hidden layer, a second partial hidden layer, and a third partial hidden layer. The first part of the hidden layer is processed by the first processing unit, the second part of the hidden layer is processed by the second processing unit, and the third part of the hidden layer is processed by the point processing unit. The tensor layout types of the first, second and third partially hidden layers are all the same, and they are determined when the model is constructed. And there may be processors in the three processing units whose tensor layout type is different from that of the hidden layer. For example, the tensor layout type of the hidden layer is NHWC, the tensor layout type of the first processing unit is NCHW, the tensor layout type of the second processing unit is NHWC, and the third tensor layout type is NCHW. The terminal device may determine that the tensor layout type of the first part of the hidden layer is different from the tensor layout type of the first processing unit; the tensor layout type of the second part of hidden layer is the same as the tensor layout type of the second processing unit; the tensor layout type of the third part of the hidden layer is different from the tensor layout type of the third processing unit. It is necessary to transpose the original tensor parameters of all the hidden layers in the first partial hidden layer and to transpose the original tensor parameters of all the hidden layers in the third hidden layer. In this way, the tensor layout type of the first partial hidden layer may be the same as the tensor layout type of the first processing unit, and the tensor layout type of the third partial hidden layer may be the same as the tensor layout type of the third processing unit.

In one possible implementation, if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit are used as the target tensor parameters.

If the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, the input data needs to be transposed if the tensor layout type of the hidden layer is the same as the tensor layout type of the corresponding processing unit. The input data can be transposed by a transposition operator method to obtain transposed input data. The transposed input data is of the same type as the tensor layout of the target processing unit.

Alternatively, as shown in fig. 3, the input layer may input the input data into the hidden layer corresponding to the processing unit. The terminal device determines that the layout type of the input data is different from the tensor layout type of the hidden layer, and inserts a transpose (Permute) operator after the input layer, and transposes the input data by the operator. Before inputting the input data to the hidden layer, if the tensor layout type of the hidden layer is different from the tensor layout type of the processing unit, the terminal device transposes the original tensor parameters in the hidden layer, so that the tensor layout type of the hidden layer is the same as the tensor layout type of the processing unit. Therefore, it is equivalent to compare the layout type of the input data with the tensor layout type of the hidden layer and to compare the layout type of the input data with the tensor layout type of the processing unit.

Alternatively, as shown in fig. 4, the input layer may input the input data to a first hidden layer among the first partially hidden layers. If the terminal device determines that the tensor layout types of the input data and the first hidden layer in the first part of hidden layers are different, a Permute operator may be inserted after the input layer, the input data is transposed, and the transposed input data is input to the first hidden layer in the first part of hidden layers. When the last hidden layer in the first part of hidden layers processes the input data input by the last hidden layer (namely, the output data of the last hidden layer), the output data of the last hidden layer can be transposed. The transposed method is to add a Permute operator after the last hidden layer in the first part of hidden layers, so as to obtain transposed output data, wherein the layout type of the transposed output data is the same as the layout type of the input data of the first hidden layer in the first part of hidden layers. When the transposed output data is input as input data (i.e., input data of a first one of the second partial concealment layers) to the first one of the second partial concealment layers, the terminal device needs to determine whether the layout type of the input data of the first one of the second partial concealment layers and the tensor layout type of the first one of the second partial concealment layers are the same. If the first hidden layer is the same, the first hidden layer in the second part of hidden layers can be directly input; if it is different, the input data of the first hidden layer in the second partially hidden layer needs to be transposed by a Permute operator. Similarly, the last hidden layer of the second hidden layer and the first hidden layer of the third hidden layer may also perform data input or output in the same manner.

It should be noted that, the transposition of the original tensor parameters of the hidden layer is direct transposition, and may not be transposed by the permate operator.

220. And processing the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, wherein the layout type of the target tensor parameters is the same as that of the target processing unit.

The target processing unit processes the input data in the corresponding hidden layer and the target tensor parameter in the hidden layer. If the number of hidden layers is multiple, the target processing unit processes the input data and tensor parameters in each hidden layer one by one. The target processing unit can obtain output data after processing is completed. After the output data is obtained, the output data can be transposed through a Permute operator, and the transposed output data is obtained. And taking the transposed output data as input data to a next processing unit of the target processing unit. Thus, data flow between different processing units is realized.

According to the embodiment of the application, the terminal equipment transposes the original tensor parameters of the hidden layer under the condition that the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, so as to obtain the target tensor parameters, wherein the layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit. Furthermore, the layout type of the input data input to the target processing unit and the tensor layout type of the target processing unit may be compared, and if the input data are determined to be different, the input data may be transposed by the transposition operator to obtain transposed input data, where the layout type of the transposed input data is the same as the tensor layout type of the target processing unit. By the method, the input data, the tensor layout type of the neural network hidden layer and the tensor layout type of the execution unit are compatible, and the processing unit is beneficial to processing the input data and parameters in the neural network hidden layer.

Referring to fig. 5, fig. 5 is a schematic diagram of a unit of a data processing apparatus according to an embodiment of the present application. The network registration device shown in fig. 5 may be used to perform some or all of the functions described above in the method embodiment depicted in fig. 2. The device can be a terminal device, a device in the terminal device, or a device which can be matched with the terminal device for use.

The logic structure of the device may include: a processing unit 510, an acquisition unit 520. When the apparatus is applied to a terminal device, the terminal device comprises an application layer, a neural network layer and at least one processing unit, the neural network layer comprises at least one hidden layer, each processing unit corresponds to one or more hidden layers of the at least one hidden layer, wherein:

the processing unit 510 is configured to transpose, if a layout type of input data input to the target processing unit is different from a tensor layout type of the target processing unit, the input data by using a transpose operator to obtain transposed input data, where the input data is tensor, and the target processing unit is any one of the at least one processing unit, and the input data is output data of an application layer or one of the at least one hidden layers;

the processing unit 510 is further configured to process, by using the target processing unit, the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit, to obtain output data, where a layout type of the target tensor parameters is the same as a tensor layout type of the target processing unit.

In one possible implementation manner, before the input data is transposed by the transpose operator to obtain transposed input data, the obtaining unit 520 is configured to obtain a tensor layout type of the hidden layer corresponding to the target processing unit and a tensor layout type of the target processing unit; the processing unit 510 is further configured to transpose the original tensor parameters in all the hidden layers corresponding to the target processing unit to obtain the target tensor parameters if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit.

In one possible implementation manner, the processing unit 510 is further configured to take, as the target tensor parameter, the original tensor parameters in all the hidden layers corresponding to the target processing unit if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit.

In one possible implementation manner, after the target processing unit processes the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit to obtain output data, the processing unit 510 is further configured to transpose the output data by using a transposition operator to obtain transposed output data; the transposed output data is used as input data to the next processing unit of the target processing unit.

In one possible implementation, the output of the application layer is an input of a first processing unit of the at least one processing unit, the output of the first processing unit is an input of a first hidden layer of the one or more hidden layers corresponding to the first processing unit, and the output of a last hidden layer of the one or more hidden layers corresponding to the first processing unit is an input of a next processing unit of the first processing unit.

Referring to fig. 6, fig. 6 is a simplified schematic diagram of an entity structure of a network registration device according to an embodiment of the present application, where the device includes a processor 610, a memory 620, and a communication interface 630, and the processor 610, the memory 620, and the communication interface 630 are connected by one or more communication buses. The network registration device may be a chip, or a chip module, etc.

The processor 610 is configured to support the network-registered device to perform the functions corresponding to the method of fig. 2 described above. It should be appreciated that in the embodiment of the present application, the processor 610 may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), off-the-shelf programmable gate arrays (field programmable gate array, abbreviated as FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 620 is used for storing program codes and the like. The memory 620 in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable ROM (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (random access memory, RAM for short) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, abbreviated as RAM) are available, such as static random access memory (static RAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, abbreviated as DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus random access memory (direct rambus RAM, abbreviated as DR RAM).

The communication interface 630 is used to transmit and receive data, information, messages, etc., and may also be described as a transceiver, transceiver circuitry, etc.

In the embodiment of the present application, when the network registration apparatus is applied to a relay device, the relay device establishes a communication connection with an access network device and establishes a communication connection with a remote device through a preset interface, and the processor 610 invokes program codes stored in the memory 620 to perform the following operations:

the processor 610 calls the program code stored in the memory 620 to transpose the input data by the transpose operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, so as to obtain transposed input data, where the input data is tensor, the target processing unit is any processing unit of the at least one processing unit, and the input data is output data of an application layer or one of the at least one hidden layers;

the processor 610 calls the program code stored in the memory 620 to process, by the target processing unit, the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit, to obtain output data, where the layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit.

In one possible implementation, before the input data is transposed by the transpose operator to obtain transposed input data, the processor 610 invokes the program code stored in the memory 620 to obtain the tensor layout type of the hidden layer corresponding to the target processing unit and the tensor layout type of the target processing unit; if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit are transposed to obtain the target tensor parameters.

In one possible implementation, the processor 610 invokes the program code stored in the memory 620 to use the original tensor parameters in all the hidden layers corresponding to the target processing unit as the target tensor parameters if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit.

In one possible implementation manner, after the target processing unit processes the transposed input data and the target tensor parameters in all hidden layers corresponding to the target processing unit to obtain output data, the processor 610 invokes the program code stored in the memory 620 to transpose the output data through the transpose operator to obtain transposed output data; the transposed output data is used as input data to the next processing unit of the target processing unit.

With respect to the apparatus and the respective modules/units contained in the product described in the above embodiments, they may be software modules/units, may be hardware modules/units, or may be partly software modules/units, and partly hardware modules/units. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal, each module/unit included in the device, product, or application may be implemented by using hardware such as a circuit, different modules/units may be located in the same component (for example, a chip, a circuit module, or the like) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program, where the software program runs on a processor integrated inside the terminal, and the remaining (if any) part of the modules/units may be implemented by using hardware such as a circuit.

Referring to fig. 7, fig. 7 is a simplified schematic diagram of a chip of a data processing apparatus according to an embodiment of the present application, where the chip includes a processor 710 and a data interface 720. The chip may be used to handle functions corresponding to the method of fig. 2. The chip may be incorporated in a data processing device as shown in fig. 6. The chip may also be included in a chip module.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs.

The units in the processing device of the embodiment of the invention can be combined, divided and deleted according to actual needs.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, storage disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid State Disk (SSD)), among others.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A data processing method, wherein the method is applied to a terminal device, the terminal device comprises at least one processing unit, the terminal device runs with an application layer and a neural network layer, the neural network layer comprises at least one hidden layer, each processing unit corresponds to one or more hidden layers in the at least one hidden layer, and the method comprises the following steps:

if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, transposing the input data through a transposition operator to obtain transposed input data, wherein the input data is tensor, the target processing unit is any processing unit in the at least one processing unit, and the input data is output data of the application layer or one hidden layer in the at least one hidden layer;

processing the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit by the target processing unit to obtain output data, wherein the layout type of the target tensor parameters is the same as the tensor layout type of the target processing unit;

transposing the output data through a transposition operator to obtain transposed output data;

and taking the transposed output data as input data to a next processing unit of the target processing unit.

2. The method of claim 1, wherein the transposing the input data by the transposing operator further comprises, before the transposing the input data is performed:

acquiring a tensor layout type of a hidden layer corresponding to the target processing unit and a tensor layout type of the target processing unit;

if the tensor layout type of the hidden layer corresponding to the target processing unit is different from the tensor layout type of the target processing unit, the original tensor parameters in all the hidden layers corresponding to the target processing unit are transposed to obtain the target tensor parameters.

3. The method according to claim 2, wherein the method further comprises:

and if the tensor layout type of the hidden layer corresponding to the target processing unit is the same as the tensor layout type of the target processing unit, taking the original tensor parameters in all the hidden layers corresponding to the target processing unit as the target tensor parameters.

4. The method of claim 1, wherein the output of the application layer is an input of a first processing unit of the at least one processing unit, the output of the first processing unit is an input of a first hidden layer of the one or more hidden layers corresponding to the first processing unit, and the output of a last hidden layer of the one or more hidden layers corresponding to the first processing unit is an input of a next processing unit of the first processing unit.

5. A data processing apparatus, applied to a terminal device, where the terminal device includes at least one processing unit, and the terminal device runs an application layer and a neural network layer, where the neural network layer includes at least one hidden layer, and each processing unit corresponds to one or more hidden layers in the at least one hidden layer, the apparatus includes:

the processing unit is used for transposing the input data through a transposition operator if the layout type of the input data input to the target processing unit is different from the tensor layout type of the target processing unit, so as to obtain transposed input data, wherein the input data is tensor, the target processing unit is any one of the at least one processing unit, and the input data is output data of one of the application layer or the at least one hidden layer;

the processing unit is further configured to process, by using the target processing unit, the transposed input data and target tensor parameters in all hidden layers corresponding to the target processing unit, to obtain output data, where a layout type of the target tensor parameters is the same as a tensor layout type of the target processing unit; and transposing the output data through a transposition operator to obtain transposed output data; and taking the transposed output data as input data to a next processing unit of the target processing unit.

6. A data processing apparatus comprising a processor, a memory and a communication interface, the processor, the memory and the communication interface being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the data processing method of any of claims 1 to 4.

7. A computer readable storage medium storing one or more instructions adapted to be loaded by a processor and to perform a data processing method according to any one of claims 1 to 4.

8. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the data processing method of any of claims 1 to 4.

9. A chip module comprising the chip of claim 8.