CN112948126A

CN112948126A - Data processing method, device and chip

Info

Publication number: CN112948126A
Application number: CN202110336963.9A
Authority: CN
Inventors: 黄海涛
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-06-11
Also published as: WO2022206536A1

Abstract

The application discloses a data processing method, a data processing device and a chip, and belongs to the technical field of computers. The method is applied to a neural network computing device, and comprises the following steps: when the first neural network forward processing is executed, the data cache module of the neural network computing device is controlled to simultaneously input target data to at least two data processing modules of the neural network computing device, so that the number of times of accessing the memory is reduced, the memory access amount is further reduced, and the performance of the hardware accelerator is improved. The embodiment of the application solves the problem that in the prior art, the memory access amount is large when a neural network model is operated.

Description

Data processing method, device and chip

Technical Field

The application belongs to the technical field of computers, and particularly relates to a data processing method, a data processing device and a chip.

Background

With the rapid development of computer technology and big data technology, the application of various neural network models becomes more and more common, for example, various application programs automatically adjust social multimedia content according to the interests of users. In the field of intelligent monitoring, the neural network model can be applied to safety guarantee, enhanced face recognition function, group behavior analysis and the like. In the field of electronic payment, the neural network model is more powerful for the detection of fraudulent behavior.

Neural network models typically require a large amount of computation, requiring a high level of hardware performance support. In the prior art, a neural network model generally needs to perform forward reasoning calculation to meet the requirements of performance and power consumption. However, when the neural network model is operated, the memory access amount is large, large memory access power consumption is generated, calculation delay is high, and the overall operation performance is reduced.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data processing method, an apparatus, and a chip, which can solve the problem in the prior art that the memory access amount is large when a neural network model is operated.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a data processing method, where the method is applied to a neural network computing device, and the method includes:

when a first neural network forward process is executed, controlling a data cache module of the neural network computing device to simultaneously input target data to at least two data processing modules of the neural network computing device;

the data processing module comprises a first processing module, a second processing module and a third processing module.

In a second aspect, an embodiment of the present application further provides a data processing apparatus, where the data processing apparatus includes:

the first control module is used for controlling the data caching module of the neural network computing device to simultaneously input target data to at least two data processing modules of the neural network computing device when the first neural network forward processing is executed;

In a third aspect, an embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a program or an instruction stored on the memory and executable on the processor, where the processor implements the steps in the data processing method described above when executing the program or the instruction.

In a fourth aspect, the present application also provides a readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the steps in the data processing method as described above.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method described above.

In the embodiment of the application, when the first neural network forward processing is executed, the data cache module of the neural network computing device is controlled to simultaneously input target data to at least two data processing modules of the neural network computing device, so that the number of times of accessing the memory is reduced, the memory access amount is further reduced, and the performance of the hardware accelerator is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a first example provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a second example provided by an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a third example provided by an embodiment of the present application;

FIG. 5 shows a block diagram of a data processing apparatus provided by an embodiment of the present application;

fig. 6 shows a block diagram of an electronic device provided by an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The data processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, an embodiment of the present application provides a data Processing method, which may be applied to a neural network computing device, such as a hardware accelerator, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), or the like. The neural network computing device may also be an electronic device including any one of a hardware accelerator, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Digital Signal Processor (DSP), such as a mobile phone and a computer.

The method comprises the following steps:

step 101, when a first neural network forward process is executed, controlling a data cache module of the neural network computing device to simultaneously input target data to at least two data processing modules of the neural network computing device; the data processing module comprises a first processing module, a second processing module and a third processing module.

The data caching module is used for caching and managing input and output data. The data cache module may be a Static Random-Access Memory (SRAM) structure that employs a plurality of banks (banks), and may be configured with at least two read/write ports to transmit data to each data processing module at the same time.

For example, when the electronic device executes the first neural network forward processing, the first processing module is configured to execute the calculation of the convolution operator and transmit the result to the second processing module or write the result back to the data caching module. The second processing module is, for example, an activation function module, and the activation function module is configured to perform nonlinear processing on neurons of the convolutional neural network and transmit a processed result to the third processing module. The third processing module, such as a pooling module, a scaling module, a summing module, etc., is used for performing pooling, scaling, corresponding point adding, etc. operations on the data and then writing the result back to the data caching module.

When the first neural network forward processing is executed, the data cache module of the neural network computing device is controlled to simultaneously input target data to at least two data processing modules of the neural network computing device, the target data is the data input to the data processing modules, and the data cache module can access the memory only once in the whole data processing process by simultaneously inputting the target data to the data processing modules, so that the memory access times are reduced, and the memory access amount is further reduced.

As a first example, with reference to fig. 2, in fig. 2, a forward processing of a first neural network is taken as an example of a convolutional neural network computing operation, a first processing module is a convolutional computing module, a second processing module is an activation function module, and a third processing module includes a pooling module, a scaling module, a summing module, and the like; when the convolutional neural network computing operation is executed, the data cache module simultaneously inputs target data to at least two of the convolutional computing module, the activation function module, the pooling module, the scaling module, the summing module and the like, and after the data are read by a reverse access memory (namely, the memory access module), the data are input to a plurality of modules, so that the number of times of accessing the memory by the data cache module is reduced. Avoiding accessing the memory once when data is input to the module once; for example, if the data caching module inputs data to the activation function module after the convolution processing module completes convolution, the memory still needs to be accessed once again at this time.

In the embodiment of the application, when the first neural network forward processing is executed, the data cache module of the neural network computing device is controlled to simultaneously input target data to the at least two data processing modules of the neural network computing device, so that the number of times of accessing the memory is reduced, the memory access amount is further reduced, and the performance of the hardware accelerator is improved. The embodiment of the application solves the problem that in the prior art, the memory access amount is large when a neural network model is operated.

In an alternative embodiment, if the first neural network forward processing comprises a convolution operation, the convolution operation comprises at least two convolution layers; each convolution layer in the convolutional neural network consists of a plurality of convolution units, and the parameters of each convolution unit are optimized through a back propagation algorithm. The purpose of the convolution operation is to extract different features of the input, as a second example, as shown in fig. 3, including a first convolution layer and a second convolution layer; each convolutional layer may have data dependency relationship, for example, the output data of the first convolutional layer may be used as the input data of the second convolutional layer; for example, a first layer of convolutional layers may only extract some low-level features such as edges, lines, corners, and other levels, while a second layer of convolutional layers may iteratively extract more complex features from the low-level features to improve the ability of convolutional codes to extract features.

In an optional embodiment, the method further comprises:

and simultaneously inputting the target data into each convolution layer, and controlling the first processing module to combine and process the operation of each convolution layer.

As a third example, as shown in FIG. 4, there may be no data dependencies between each convolutional layer; thus, the target data is used as input data and is simultaneously input into each convolution layer (the first convolution layer and the second convolution layer), so that the data cache module can finish inputting data into the convolution layers by only accessing the memory once, and the phenomenon that the memory access amount is too high because the memory is accessed once when the data is input into each convolution layer is avoided.

In an alternative embodiment, the convolutional layers comprise a first convolutional layer and a second convolutional layer;

the method further comprises the following steps:

and inputting the target data into the first convolution layer to obtain first processing data, inputting the first processing data into the second convolution layer, and controlling the first processing module to merge and process the operation of each convolution layer.

For the convolution layer with data dependency, with reference to fig. 3, the first convolution layer is the previous convolution layer, the second convolution layer is the next convolution layer, the output data of the previous convolution layer is used as the input data of the next convolution layer, after the partial or complete calculation of the previous convolution layer is completed, the output data (i.e. the result of convolution calculation) may not be written back to the memory, but the data cache module directly uses the output data as the input data of the convolution calculation of the next convolution layer, and then directly starts the calculation of the next convolution layer, so as to reduce the memory access.

In an optional embodiment, if the input data of the third processing module includes at least two levels of output data, when the third processing module is controlled to execute the second neural network forward processing, the output data of the first level is input to the second level; wherein the hierarchy is a hierarchy in a topology of the second neural network that forward processes the corresponding neural network.

Taking the first neural network forward processing as an example, if the input data of the third processing module includes at least two levels of output data, the levels include a first level and a second level; the hierarchy is that in the topology of the neural network corresponding to the forward processing of the second neural network, when the branch layer preceding the third processing module, for example, the third processing module, processes the summation operation, its input data includes data transmitted from at least two activation function modules.

When the first-level operation requires two layers of output data as input data, the third processing module is controlled to execute the second neural network forward processing, the output data of the first level is input into the second level, namely the output data of the first level can be directly calculated by the second level without writing back the data to the memory, so that the writing and reading of the memory by the branch layer are reduced.

Having described the data processing method provided by the embodiments of the present application, the data processing apparatus provided by the embodiments of the present application will be described below with reference to the accompanying drawings.

It should be noted that, in the data processing method provided in the embodiment of the present application, the execution main body may be a data processing apparatus, or a control module in the data processing apparatus for executing the data processing method. In the embodiment of the present application, a data processing method performed by a data processing apparatus is taken as an example, and the data processing method provided in the embodiment of the present application is described.

Referring to fig. 5, an embodiment of the present application further provides a data processing apparatus 500, including:

a first control module 501, configured to control a data caching module of the neural network computing device to simultaneously input target data to at least two data processing modules of the neural network computing device when performing a first neural network forward process;

In an optional embodiment, the apparatus 500 further comprises:

and the second control module is used for simultaneously inputting the target data to each convolution layer and controlling the first processing module to merge and process the operation of each convolution layer.

the apparatus 500 further comprises:

and the third control module is used for inputting the target data into the first convolution layer to obtain first processing data, inputting the first processing data into the second convolution layer, and controlling the first processing module to carry out merging processing on the operation of each convolution layer.

In an optional embodiment, the apparatus 500 further comprises:

the fourth control module is used for controlling the third processing module to input the output data of the first level into the second level when the third processing module executes the second neural network forward processing if the input data of the third processing module comprises at least two levels of output data;

wherein the hierarchy is a hierarchy in a topology of the second neural network that forward processes the corresponding neural network.

In this embodiment of the present application, when the first control module 501 executes the first neural network forward processing, the data cache module of the neural network computing device is controlled to simultaneously input target data to at least two data processing modules of the neural network computing device, so as to reduce the number of times of accessing the memory, thereby reducing the memory access amount and improving the performance of the hardware accelerator.

The data processing apparatus 500 in the embodiment of the present application may be the apparatus 500, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the Mobile electronic device may be a Mobile phone, a tablet Computer, a notebook Computer, a palm top Computer, an in-vehicle electronic device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-Mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (Personal Computer, PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.

The data processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The data processing apparatus provided in the embodiment of the present application can implement each process implemented by the data processing apparatus in the method embodiments of fig. 1 to fig. 4, and is not described here again to avoid repetition.

Optionally, an electronic device is further provided in this embodiment of the present application, and includes a processor 610, a memory 609, and a program or an instruction stored in the memory 609 and capable of being executed on the processor 610, where the program or the instruction is executed by the processor 610 to implement each process of the data processing method embodiment, and can achieve the same technical effect, and details are not described here to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above; the electronic device may be a device that includes modules such as a hardware accelerator, CPU, GPU, DSP, etc.

Fig. 6 is a schematic hardware structure diagram of an electronic device 600 implementing various embodiments of the present application;

the electronic device 600 includes, but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, and a power supply 611.

Those skilled in the art will appreciate that the electronic device 600 may further comprise a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 610 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The processor 610 is configured to control the data caching module of the neural network computing device to simultaneously input target data to at least two data processing modules of the neural network computing device when a first neural network forward process is executed;

Optionally, the processor 610 is configured to, if the first neural network forward processing includes a convolution operation, include at least two convolution layers.

Optionally, the processor 610 is configured to input the target data to each of the convolutional layers at the same time, and control the first processing module to merge and process the operation of each of the convolutional layers.

Optionally, the convolutional layers comprise a first convolutional layer and a second convolutional layer;

the processor 610 is configured to input the target data to the first convolution layer to obtain first processing data, input the first processing data to the second convolution layer, and control the first processing module to merge and process an operation of each convolution layer.

Optionally, the processor 610 is configured to, if the input data of the third processing module includes at least two levels of output data, control the third processing module to perform the second neural network forward processing, and input the first level of output data to the second level;

In this embodiment, when the processor 610 executes the first neural network forward processing, the data cache module of the neural network computing device is controlled to simultaneously input target data to the at least two data processing modules of the neural network computing device, so as to reduce the number of times of accessing the memory, further reduce the memory access amount, and improve the performance of the hardware accelerator.

It is to be understood that, in the embodiment of the present application, the input Unit 604 may include a Graphics Processing Unit (GPU) 6041 and a microphone 6042, and the Graphics Processing Unit 6041 processes image data of a still picture or a video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 607 includes a touch panel 6071 and other input devices 6072. A touch panel 6071, also referred to as a touch screen. The touch panel 6071 may include two parts of a touch detection device and a touch controller. Other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 609 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 610 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the data processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A data processing method applied to a neural network computing device is characterized by comprising the following steps:

2. The data processing method of claim 1, wherein if the first neural network forward processing comprises a convolution operation, the convolution operation comprises at least two convolution layers.

3. The data processing method of claim 2, wherein the method further comprises:

4. The data processing method of claim 2, wherein the convolutional layer comprises a first convolutional layer and a second convolutional layer;

the method further comprises the following steps:

5. The data processing method of claim 1, wherein the method further comprises:

if the input data of the third processing module comprises at least two levels of output data, controlling the third processing module to input the output data of the first level to the second level when executing the second neural network forward processing;

6. A data processing apparatus, characterized in that the apparatus comprises:

7. The data processing apparatus of claim 6, wherein if the first neural network forward processing comprises a convolution operation, the convolution operation comprises at least two convolution layers.

8. The data processing apparatus of claim 7, wherein the apparatus further comprises:

9. The data processing apparatus of claim 7, wherein the convolutional layer comprises a first convolutional layer and a second convolutional layer;

the device further comprises:

10. The data processing apparatus of claim 6, wherein the apparatus further comprises:

11. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the data processing method of any one of claims 1 to 5.

12. A readable storage medium, on which a program or instructions are stored, which, when executed by a processor, carry out the steps of the data processing method according to any one of claims 1 to 5.

13. A chip, characterized in that it comprises a processor and a communication interface, said communication interface being coupled to said processor, said processor being adapted to execute programs or instructions implementing the steps of the data processing method according to any one of claims 1 to 5.