CN115860079A

CN115860079A - Neural network acceleration device, method, chip, electronic device, and storage medium

Info

Publication number: CN115860079A
Application number: CN202310044518.4A
Authority: CN
Inventors: 姜宇奇
Original assignee: Shenzhen Jiutian Ruixin Technology Co ltd
Current assignee: Shenzhen Jiutian Ruixin Technology Co ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2023-03-28
Anticipated expiration: 2043-01-30
Also published as: CN115860079B

Abstract

The application provides a neural network accelerating device, a method, a chip, an electronic device and a storage medium, wherein the device comprises: the memory computing module is used for distributing the original data to a plurality of memory computing units for accelerated computing, and sending the first data obtained by computing to the vector processing module or the storage module; the vector processing module is used for carrying out vector processing on the received first data to obtain second data and then sending the second data to the tensor processing module or the storage module; the tensor processing module is used for carrying out tensor processing on the received second data to obtain third data and then sending the third data to the storage module; the storage module is used for storing the original data, the first data, the second data and the third data. The method and the device can solve the problem that the power consumption of the existing neural network accelerator is too high, and reduce the power consumption under the same computational power requirement, thereby expanding the application of the neural network accelerator in the edge processor with lower power and higher power consumption requirement.

Description

Neural network acceleration device, method, chip, electronic device, and storage medium

Technical Field

The present application relates to the field of computer and data processing technologies, and in particular, to a neural network acceleration apparatus, a neural network acceleration method, a neural network acceleration chip, an electronic device, and a storage medium.

Background

Currently, an existing Neural network accelerator (NPU) is constructed by a series of digital circuits, and mainly functions to accelerate processing of a convolutional Neural network, so that a faster processing speed and lower power consumption can be achieved compared with processing on a CPU or a GPU. In addition, in order to promote the development of artificial intelligence, the prior art also provides a new memory computation CIM architecture, which can perform computation in a memory, is mainly used for data intensive computation at present, and has many advantages in the aspect of big data application.

However, in the course of research and practice on the prior art, the inventors of the present application found that the power consumption of the existing neural network accelerator is mainly generated by the digital multiplier-adder, and the power consumption of this part is not large in the space that can be optimized under the premise of a certain computational effort, and can only be reduced by virtue of the progress of the production process, thereby limiting the computational effort and application on the edge-end processor.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides a neural network acceleration apparatus, a neural network acceleration method, a chip, an electronic device, and a storage medium.

In order to solve the above technical problem, the present application provides a neural network acceleration apparatus, including: the device comprises a memory computing module, a vector processing module, a tensor processing module and a storage module, wherein the memory computing module, the vector processing module and the tensor processing module are electrically connected in sequence;

the memory computing module comprises a data reading unit and at least one memory computing unit; the in-memory computing unit comprises a neural network; the data reading unit is used for initiating a data reading request and reading original data in the storage module; the memory computing module is used for distributing the original data in the storage module to the at least one memory computing unit; the memory computing unit is used for receiving the distributed original data, performing accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to the vector processing module and/or the storage module;

the vector processing module is configured to receive first data sent by the memory computing module, perform vector processing on the first data to obtain corresponding second data, and send the second data to the tensor processing module and/or the storage module;

the tensor processing module is used for receiving the second data sent by the vector processing module, carrying out tensor processing on the second data to obtain corresponding third data and sending the third data to the storage module for storage;

the storage module is used for storing original data written through a bus, first data sent by the memory calculation module, second data sent by the vector processing module and third data sent by the tensor processing module.

Optionally, the allocating the raw data in the storage module to the at least one in-memory computing unit includes:

acquiring a preset execution sequence of the neural network corresponding to each in-memory computing unit, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator of the neural network, arrangement of weights corresponding to each layer of operator in a memory, and data path configuration in an image processing device;

and distributing the original data to the neural networks corresponding to the memory computing units based on the preset execution sequence.

Optionally, the allocating the raw data to the neural networks corresponding to the memory computing units based on the preset execution order includes:

classifying the received original data according to data types;

and respectively distributing the original data after the data types are classified to the neural networks in the plurality of memory computing units based on the preset execution sequence.

Optionally, the performing accelerated computation on the raw data distributed to the neural network, obtaining first data through computation, and sending the first data to the vector processing module and/or the storage module includes:

performing accelerated calculation on the original data in the neural network according to a preset execution sequence through the memory calculation unit;

and sending the first data obtained by accelerated calculation to a vector processing module and/or a storage module based on the preset execution sequence.

Optionally, the determining manner of the preset execution sequence includes:

acquiring a neural network model file;

compiling the neural network model file through a neural network compiler to obtain a preset execution sequence corresponding to the neural network, wherein the preset execution sequence comprises execution instructions corresponding to operators of each layer of the neural network, arrangement of weights of the neural network in a memory, and data path configuration of a neural network accelerating device.

Optionally, the neural network acceleration device further includes a controller, and the controller is electrically connected to the memory calculation module, the vector processing module, the tensor processing module, and the storage module, respectively;

the controller is used for controlling data paths between the memory computing module, the vector processing module, the tensor processing module and the storage module, and determining configuration information of the memory computing module, the vector processing module, the tensor processing module and the storage module.

Optionally, the neural network acceleration device further includes a memory management module, and the memory management module is respectively connected to the memory calculation module, the vector processing module, the tensor processing module, the storage module, and the controller;

and the memory management module is used for respectively managing the physical memory areas and the read-write configuration corresponding to the memory calculation module, the vector processing module, the tensor processing module, the storage module and the controller.

Optionally, the vector processing module comprises an activation unit, a pooling unit, a scaling unit and an element processing unit;

the activation unit is used for performing data activation processing on the received first data;

the pooling unit is used for performing maximum pooling and/or average pooling on the received first data;

the scaling unit is used for performing image up-sampling and down-sampling processing on the received first data;

the element processing unit performs element-by-element addition and subtraction multiplier operation processing on the received first data.

Optionally, the tensor processing module includes a data rearranging unit and a data combining unit;

the data rearrangement unit is used for carrying out data rearrangement on the received second data based on a preset rearrangement rule to obtain rearranged second data;

and the data combination unit is used for carrying out data combination on the rearranged second data based on a preset combination rule to obtain corresponding third data.

Correspondingly, the application also provides a neural network acceleration method, which comprises the following steps:

initiating a data reading request, and reading original data in a storage module; distributing the original data in the storage module to at least one in-memory computing unit; receiving the distributed original data through a memory computing unit, carrying out accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to a vector processing module and/or a storage module;

receiving first data sent by an in-memory computing module, carrying out vector processing on the first data to obtain corresponding second data, and sending the second data to a tensor processing module and/or a storage module;

receiving second data sent by the vector processing module, carrying out tensor processing on the second data to obtain corresponding third data, and sending the third data to the storage module for storage;

and storing original data written through a bus, first data sent by the memory computing module, second data sent by the vector processing module and third data sent by the tensor processing module.

Optionally, the allocating the raw data in the storage module to at least one in-memory computing unit includes:

Optionally, the allocating the raw data to the neural network corresponding to each memory computing unit based on the preset execution sequence includes:

classifying the received original data according to data types;

Optionally, the determining manner of the preset execution sequence includes:

acquiring a neural network model file;

Optionally, the neural network acceleration method further includes:

and controlling data paths between the memory computing module, the vector processing module, the tensor processing module and the storage module, and determining configuration information of the memory computing module, the vector processing module, the tensor processing module and the storage module.

Optionally, the neural network acceleration method further includes:

and respectively managing the physical memory areas and the read-write configuration corresponding to the memory calculation module, the vector processing module, the tensor processing module, the storage module and the controller.

Optionally, the vector processing on the first data includes:

performing data activation processing on the received first data;

performing maximum pooling and/or average pooling on the received first data;

performing image up-sampling and down-sampling processing on the received first data;

and carrying out element-by-element addition and subtraction multiplier operation processing on the received first data.

Optionally, the tensor processing the second data includes:

performing data rearrangement on the received second data based on a preset rearrangement rule to obtain rearranged second data;

and based on a preset combination rule, performing data combination on the rearranged second data to obtain corresponding third data.

An embodiment of the present application further provides a chip including the neural network acceleration device as described above.

An embodiment of the present application further provides an electronic device, including the neural network acceleration apparatus as described above.

An embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the neural network acceleration method as described above.

The embodiment of the invention has the following beneficial effects:

as described above, according to the neural network acceleration apparatus, the neural network acceleration method, the electronic device, and the storage medium provided in the present application, the apparatus includes a memory calculation module, a vector processing module, and a tensor processing module, which are electrically connected in sequence, and a storage module electrically connected to the memory calculation module, the vector processing module, and the tensor processing module; the memory computing module comprises a data reading unit and at least one memory computing unit; the in-memory computing unit comprises a neural network; the data reading unit is used for initiating a data reading request and reading original data in the storage module; the memory computing module is used for distributing the original data in the storage module to at least one memory computing unit; the memory computing unit is used for receiving the distributed original data, carrying out accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to the vector processing module and/or the storage module; the vector processing module is used for receiving the first data sent by the memory computing module, carrying out vector processing on the first data to obtain corresponding second data and sending the second data to the tensor processing module and/or the storage module; the tensor processing module is used for receiving the second data sent by the vector processing module, carrying out tensor processing on the second data to obtain corresponding third data and sending the third data to the storage module for storage; and the storage module is used for storing original data written through the bus, first data sent by the memory calculation module, second data sent by the vector processing module and third data sent by the tensor processing module. This application is through a plurality of in-memory computing units of in-memory computing module, calculate with higher speed the original data of distributing to the neural network, when guaranteeing to satisfy the power demand of calculating, greatly reduced neural network accelerating device's consumption, improve neural network accelerating device's energy consumption ratio, solve the too high problem of current neural network accelerator consumption, low-power consumption reduces at equal power demand of calculating, expand the application of neural network accelerator at the lower and higher marginal end treater of power consumption requirement.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a neural network acceleration device according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a neural network acceleration method provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of step S1 provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of step S12 provided in the embodiment of the present application;

fig. 5 is a schematic flowchart of step S2 provided in the embodiment of the present application;

fig. 6 is a schematic flowchart of step S3 provided in the embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. The drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the concepts of the application by those skilled in the art with reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of a claim "comprising a" 8230a "\8230means" does not exclude the presence of additional identical elements in the process, method, article or apparatus in which the element is incorporated, and further, similarly named components, features, elements in different embodiments of the application may have the same meaning or may have different meanings, the specific meaning of which should be determined by its interpretation in the specific embodiment or by further combination with the context of the specific embodiment.

It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, items, species, and/or groups thereof. As used herein, the terms "or," "and/or," "including at least one of the following," and the like, are to be construed as inclusive or meaning any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," a, B or C "or" a, B and/or C "means" any one of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

The words "if", as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030, when" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S1 and S2 are used herein for the purpose of more clearly and briefly describing the corresponding contents, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S2 first and then S1 in the specific implementation, but these should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The application scenario that the application can provide is introduced firstly, for example, a neural network accelerating device is provided for the fields of automatic driving, AR, VR and the like, the neural network accelerating device can be applied to an edge processor which is low in power and high in power consumption requirement in the fields, and the problem that an existing neural network accelerator is high in power consumption is solved.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a neural network acceleration device according to an embodiment of the present disclosure. The neural network accelerating device specifically comprises a memory calculating module 10, a vector processing module 20 and a tensor processing module 30 which are electrically connected in sequence, and a storage module 40 which is electrically connected with the memory calculating module 10, the vector processing module 20 and the tensor processing module 30; the output end of the memory computing module 10 is connected to the input end of the vector processing module 20, the output end of the vector processing module 20 is connected to the input end of the tensor processing module 30, the output end of the tensor processing module 30 is connected to the input end of the storage module 40, and the input end of the storage module 40 is further connected to the output end of the memory computing module 10 and the output end of the vector processing module 20.

The memory computing module 10 comprises a data reading unit and at least one memory computing unit; the in-memory computing unit comprises a neural network; the data reading unit is used for initiating a data reading request and reading original data in the storage module 40; the memory computing module 10 is used for distributing the original data in the storage module 40 to at least one memory computing unit; and the memory computing unit is used for receiving the distributed original data, performing accelerated computation on the original data distributed to the neural network, obtaining first data through computation, and sending the first data to the vector processing module 20 and/or the storage module 40.

Specifically, for the memory computing module 10, a read data request is initiated to the memory module 40 through the memory computing module 10, the memory module 40 outputs corresponding raw data to the memory computing module 10 after responding to the read data request, the memory computing module 10 allocates the raw data to different memory computing units for accelerated computing after receiving the raw data sent by the memory module 40, and the memory computing units send the first data obtained by computing to the vector processing module 20 and/or the memory module 40 for storage after computing.

The memory computing module 10 includes a data reading unit 101 and at least one memory computing unit, where the data reading unit 101 is mainly configured to initiate a read data request, so as to read original data in the storage module 40 according to the read data request, and distribute the original data to neural networks corresponding to multiple memory computing units; the memory computing unit 102 is mainly configured to receive raw data, perform accelerated computation on the raw data distributed to the neural network, and send the first data obtained through computation to the vector processing module 20 and/or the storage module 40.

In a specific embodiment, the plurality of in-memory computing units 102 of the in-memory computing module 10 are specifically configured to perform accelerated processing on the raw data allocated in the neural network, after the in-memory computing module 10 allocates the received raw data, the plurality of in-memory computing units 102 perform data accelerated processing at the same time, and perform accelerated processing on the image data by the neural network through the in-memory computing units, so as to improve the efficiency of accelerating the neural network. The image data processed by each memory computing unit may be of the same type or of different types, and the image data processed by each memory computing unit may be independent or have a data flow relationship. It should be noted that the memory computing module 10 in this embodiment may be specifically a memory computing matrix multiplication accelerator or a vector matrix multiplication accelerator, and the two accelerators mainly perform accelerated computing in an array manner, so that the computing efficiency is higher and the power consumption is lower.

The vector processing module 20 is configured to perform vector processing on the first data sent by the memory computing module 10 to obtain corresponding second data, and send the second data to the tensor processing module 30 and/or the storage module 40;

specifically, for the vector processing module 20, after receiving the first data sent by the memory computing module 10, a series of vector processing, such as activation, pooling, scaling and element processing, is performed on the vectors through the convolutional neural network of the vector processing module 20, and finally, the obtained second data is sent to the tensor processing module 30 and/or the memory module 40. Wherein the second data may be vector data. It should be noted that, when the second data is sent to the memory in the tensor processing module 30 and/or the storage module 40, the sending and storing positions and the storing order are stored according to the preset execution order, and the second data is stored according to the data channel configuration, so that the ordering and the reliability of the data are improved.

The tensor processing module 30 is configured to receive the second data sent by the vector processing module 20, perform tensor processing on the second data to obtain corresponding third data, and send the third data to the storage module 40 for storage;

specifically, for the tensor processing module 30, after receiving the second data sent by the vector processing module 20, the tensor processing module performs tensor processing, such as data rearrangement and data combination, on the second data, so as to complete the operation of a window in the neural network, including but not limited to direct connection, combination, rearrangement, and the like; and finally, obtaining corresponding third data, and sending the third data to the storage module 40 for storage. Wherein the third data may be tensor data. It should be noted that, when the third data is sent to the memory in the storage module 40, the sending and storing positions and the storing order are stored according to the preset execution order, and the third data is stored according to the data channel configuration, so that the ordering and the reliability of the data are improved.

The storage module 40 is configured to store the original data written through the bus, the first data sent by the memory calculation module 10, the second data sent by the vector processing module 20, and the third data sent by the tensor processing module 30.

Specifically, for the storage module 40, which is mainly used for storing relevant data of the neural network acceleration device, first, the raw data (for example, image data) to be processed is written into the storage module 50 through a bus, where the bus may be an AXI/AHB bus or a custom bus; the storage module 40 is further configured to store the first data sent by the internal computing module 10, the second data sent by the vector processing module 20, and the third data sent by the tensor processing module 30. After the tensor processing module 30 sends the third data to the storage module 40, it indicates that the data processing of a part of the neural network is completed, and after all layers of the neural network are processed, the storage module 40 stores all the data and sends the data to other devices or modules through the bus for subsequent data processing. The storage module 40 may be an on-chip storage module, and the on-chip storage module has the advantages of fast reading speed and low power consumption compared with an off-chip storage module, and further improves the data processing efficiency of the neural network acceleration device.

In conventional computer architectures, data is currently moved from memory to the processing unit, and intermediate results are then stored back in memory. However, such unnecessary information transmission not only increases the computation delay, but also increases the related power consumption. In the embodiment, the neural network accelerating device based on the SRAM digital-analog memory computing technology performs accelerated computing on the original data distributed to the neural network through a plurality of memory computing units of the memory computing module, so that the requirement of computing power is met, the power consumption of the neural network accelerating device is greatly reduced, the energy consumption ratio of the neural network accelerating device is further improved, the problem of overhigh power consumption of the existing neural network accelerator is solved, and the application of the neural network accelerator in a marginal processor with low power and high power consumption requirement is expanded.

Optionally, in some embodiments, the allocating the raw data in the storage module 40 to at least one in-memory computing unit includes:

acquiring a preset execution sequence of the neural network corresponding to each in-memory computing unit, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator of the neural network, arrangement of weights corresponding to each layer of operator in the in-memory and data path configuration in the image processing device;

and distributing the original data to the neural networks corresponding to the memory computing units based on a preset execution sequence.

Specifically, the memory computing module 10 is further configured to obtain a preset execution sequence of the neural network corresponding to each memory computing unit included in the memory computing module, where the preset execution sequence includes an execution command corresponding to each layer of operator of each neural network, arrangement of overall weights of the neural network in a memory of the memory, and data path configuration of each component in the image processing apparatus; the memory computing module 10 distributes the acquired raw data to the neural network corresponding to each memory computing unit according to a preset execution sequence.

Optionally, in some embodiments, the performing accelerated computation on the raw data distributed to the neural network, obtaining first data by computation, and sending the first data to the vector processing module and/or the storage module includes:

performing accelerated calculation on the original data in the neural network according to a preset execution sequence through an in-memory calculation unit;

and sending the first data obtained by the accelerated computation to a vector processing module and/or a storage module based on a preset execution sequence.

Specifically, the in-memory computing unit distributes the image data to operators of each layer in the corresponding neural network according to a preset execution sequence, so that the image data needing accelerated computation is quickly distributed to each layer in the plurality of neural networks for accelerated computation, and the computation speed of the neural networks is improved; and after finishing the accelerated calculation, sending the first data obtained by the accelerated calculation to a vector processing module and/or a storage module according to a preset execution sequence.

Optionally, in some embodiments, the allocating, based on the preset execution order, the raw data to a neural network corresponding to each of the in-memory computing units includes:

classifying the data types of the received original data;

and respectively distributing the original data after the data types are classified to the neural networks in the plurality of memory computing units based on a preset execution sequence.

Specifically, the memory computing module 10 may be further configured to perform data type classification on the original data read from the storage module 40, allocate the original data after the data type classification to each layer of operator of the neural network in different memory computing units according to a preset execution sequence, and perform accelerated computation on the original data allocated to each layer of operator in the neural network through the memory computing unit, so as to improve the efficiency of data processing.

and sending the first data obtained by the accelerated calculation to a vector processing module and/or a storage module based on a preset execution sequence.

Specifically, the initial data in the neural network is subjected to accelerated calculation through an in-memory calculation unit according to a preset execution sequence, the initial data after data type classification is distributed to each layer of operator of the neural network in different in-memory calculation units, and then the initial data distributed to each layer of operator in the neural network is subjected to accelerated calculation through the in-memory calculation unit; after the accelerated computation, the first data obtained by the accelerated computation is sent to the vector processing module 20 and/or the storage module 40 based on a preset execution sequence. It should be noted that, when the first data is sent to the memories in the vector processing module 20 and/or the storage module 40, the sending and storing positions and the storing order are stored according to the preset execution order, and the first data is stored according to the arrangement of the overall weight of the neural network in the memory of the storage, so as to improve the ordering and the reliability of the data.

Optionally, in some embodiments, the determining of the preset execution order includes:

acquiring a neural network model file;

compiling the neural network model file through a neural network compiler to obtain a preset execution sequence corresponding to the neural network, wherein the preset execution sequence comprises execution instructions corresponding to operators of each layer of the neural network, arrangement of weights of the neural network in a memory, and data path configuration of the neural network accelerating device.

Specifically, the determination mode of the preset execution sequence is determined by a neural network compiler, a neural network of an in-memory computing unit, such as a CNN convolutional neural network, is obtained, the neural network is compiled by the neural network compiler, and a neural network model is obtained, which may be a pb, onnx or tflite file, and includes complete weights in the neural network and operators of each layer; and compiling the neural network model through a neural network compiler to obtain a preset execution sequence of the neural network on the accelerator, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator, the arrangement of the weight of the neural network in a memory and the configuration of a data path between each module, and thus, whether the first data is sent to the vector processing module 20 or the storage module 40 is determined according to the preset execution sequence. The neural network compiler can set memory priority or calculation priority, the two compiling modes can generate different execution instructions, and corresponding data paths are also different. It should be noted that the memory refers to a part of the memory for temporarily storing data and data exchanged with the external memory.

Optionally, in some embodiments, the neural network accelerating device further includes a controller 50, and the controller 50 is electrically connected to the memory calculating module 10, the vector processing module 20, the tensor processing module 30 and the storage module 40 respectively;

and the controller 50 is configured to control data paths between the memory calculation module 10, the vector processing module 20, the tensor processing module 30, and the storage module 40, and determine configuration information of the memory calculation module 10, the vector processing module 20, the tensor processing module 30, and the storage module 40.

Specifically, the controller 50 is connected to each module of the neural network acceleration device, and is responsible for controlling data paths between each module of the neural network acceleration device and configuration information of each module, so as to facilitate subsequent management and configuration of the neural network acceleration device.

Optionally, in some embodiments, the neural network acceleration apparatus further includes a memory management module 60, and the memory management module 60 is connected to the memory calculation module 10, the vector processing module 20, the tensor processing module 30, the storage module 40, and the controller 50, respectively;

the memory management module 60 is configured to manage the physical memory areas and the read-write configurations corresponding to the memory calculation module 10, the vector processing module 20, the tensor processing module 30, the storage module 40, and the controller 50, respectively.

Specifically, the neural network acceleration device in this embodiment further includes a memory management module 60, and the memory management module 60 is connected to other modules of the neural network acceleration device and is mainly responsible for dividing and reading and writing the physical memory area of each module. The memory management module 60 may further include an arbiter and a direct memory access; the arbiter is used for determining the priority of data transmission of each module, and the arbiter starts the access of the peripheral/memory according to the priority of the channel request; the direct memory access device is used for generating a memory address and starting a memory read or write cycle.

Optionally, in some embodiments, the vector processing module 20 comprises an activation unit 201, a pooling unit 202, a scaling unit 203, and an element processing unit 204;

an activation unit 201, configured to perform data activation processing on received first data;

a pooling unit 202 for performing maximum pooling and/or average pooling on the received first data;

a scaling unit 203 for performing image up-sampling and down-sampling processing on the received first data;

the element processing unit 204 performs addition and subtraction multiplier operation processing on the received first data element by element.

Specifically, the vector processing module 20 in this embodiment includes an activation unit 201, a pooling unit 202, a scaling unit 203, and an element processing unit 204, where the activation unit 201 may be an activation layer of a convolutional neural network, which includes a ReLU, a leakyreu, and the like, and is mainly used for performing activation processing on the first data; the pooling unit 202 is configured to perform maximum pooling and average pooling on the first data; the scaling unit 203 is used for performing image up-sampling and image down-sampling processing on the first data; the element processing unit 204 is configured to perform addition, subtraction, multiplication, and division operations on the first data elements. The processing sequence of the living unit 201, the pooling unit 202, the scaling unit 203, and the element processing unit 204 may be combined at will, and may be specifically set according to the actual scene requirement.

Alternatively, as shown in fig. 1, in some embodiments, the tensor processing module 30 includes a data rearranging unit 301 and a data combining unit 302;

a data rearranging unit 301, configured to rearrange the received second data based on a preset rearranging rule, to obtain rearranged second data;

a data combining unit 302, configured to perform data combination on the rearranged second data based on a preset combining rule, so as to obtain corresponding third data.

Specifically, the tensor processing module 30 in this embodiment includes a data rearranging unit 301 and a data combining unit 302, where the data rearranging unit 301 is configured to rearrange the received second data, and the data combining unit 302 is configured to combine the rearranged second data based on a preset combination rule, so as to arrange the second data into third data, for example, arrange vector data into vector data, complete an operation of one window in a neural network, including but not limited to operations such as direct connection, combination, and rearrangement, and obtain target data after completing operations of all windows of all neural networks.

Accordingly, as shown in fig. 2, an embodiment of the present application further provides a neural network acceleration method, which can be executed in the neural network acceleration apparatus described above, and specifically includes the following steps:

s1, initiating a data reading request, and reading original data in a storage module; distributing the original data in the storage module to at least one in-memory computing unit; and receiving the distributed original data through the memory computing unit, carrying out accelerated computation on the original data distributed to the neural network, obtaining first data through computation, and sending the first data to the vector processing module and/or the storage module.

Specifically, for step S1, a read data request is initiated to the memory module 40 through the memory computing module 10, the memory module 40 outputs corresponding raw data to the memory computing module 10 after responding to the read data request, the memory computing module 10 allocates the raw data to different memory computing units for accelerated computing after receiving the raw data sent by the memory module 40, and the memory computing units send the first data obtained by computing to the vector processing module 20 or send the first data obtained by computing to the memory module 40 for storage after computing.

S2, receiving first data sent by the memory computing module, carrying out vector processing on the first data to obtain corresponding second data, and sending the second data to the tensor processing module and/or the storage module;

specifically, for step S2, after receiving the first data sent by the in-memory computation module 10, a series of vector processing, such as activation, pooling, scaling and element processing, is performed on the vectors through the convolutional neural network of the vector processing module 20, and finally, the obtained second data is sent to the tensor processing module 30. Wherein the second data may be vector data.

S3, receiving second data sent by the vector processing module, carrying out tensor processing on the second data to obtain corresponding third data, and sending the third data to the storage module for storage;

specifically, for step S3, after receiving the second data sent by the vector processing module 20, the tensor processing module performs tensor processing on the second data, for example, data rearrangement and data combination, so as to complete the operation of one window in the neural network, including but not limited to direct connection, combination, rearrangement, and the like; and finally, obtaining corresponding third data, and sending the third data to the storage module 40 for storage. Wherein the third data may be tensor data.

And S4, storing the original data written through the bus, the first data sent by the memory computing module, the second data sent by the vector processing module and the third data sent by the tensor processing module.

Specifically, for step S4, the data is mainly used for storing relevant data of the neural network acceleration device, and is first raw data (for example, image data) to be processed, and the raw data is written into the storage module 50 through a bus, where the bus may be an AXI/AHB bus or a custom bus; the storage module 40 is further configured to store the first data sent by the internal computing module 10 and the third data sent by the tensor processing module 30.

As can be seen, in the neural network acceleration method provided in this embodiment, the original data is received according to the initiated read data request, and is distributed to the plurality of memory computing units for acceleration computing, so as to obtain the first data through computation; carrying out vector processing on the first data to obtain corresponding second data; tensor processing is carried out on the second data to obtain corresponding third data; the original data, the first data, and the third data written through the bus are stored. According to the embodiment of the application, the original data distributed to the neural network are subjected to accelerated calculation through the plurality of in-memory calculating units, the calculation requirement is met, meanwhile, the power consumption of the neural network accelerating device is greatly reduced, the energy consumption ratio of the neural network accelerating device is improved, the problem that the power consumption of an existing neural network accelerator is too high is solved, and the application of the neural network accelerator to an edge processor with lower power and higher power consumption requirement is expanded.

Optionally, as shown in fig. 3, in some embodiments, step S1 may specifically include:

s11, acquiring a preset execution sequence of the neural network corresponding to each memory computing unit, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator of the neural network, arrangement of the weight corresponding to each layer of operator in a memory and data path configuration in the image processing device;

and S12, distributing the original data to the neural networks corresponding to the memory computing units based on a preset execution sequence.

Specifically, for step S1, a preset execution sequence of the neural network corresponding to each in-memory computing unit included therein is obtained, where the preset execution sequence includes an execution command corresponding to each layer of operator of each neural network, arrangement of overall weights of the neural network in the memory of the storage, and data path configuration of each component in the image processing apparatus; the memory computing module 10 distributes the acquired raw data to the neural network corresponding to each memory computing unit according to a preset execution sequence.

Optionally, as shown in fig. 4, in some embodiments, step S12 may specifically include:

s121, carrying out data type classification on the received original data;

and S122, distributing the original data after data type classification to a neural network in a plurality of memory computing units respectively based on a preset execution sequence.

Specifically, the memory computing module 10 may be further configured to perform data type classification on the original data read from the storage module 40, allocate the original data after the data type classification to each layer of operators of the neural network in different memory computing units according to a preset execution sequence, and perform accelerated computing on the original data allocated to each layer of operators in the neural network through the memory computing units, so as to improve the efficiency of data processing.

Optionally, as shown in fig. 3, in some embodiments, the step S1 may further include:

s13, performing accelerated calculation on the original data in the neural network through a memory calculation unit according to a preset execution sequence;

and S14, sending the first data obtained by accelerated calculation to a vector processing module and/or a storage module based on a preset execution sequence.

Specifically, the memory computing unit distributes the image data to operators of each layer in the corresponding neural network according to a preset execution sequence, so that the image data needing accelerated computation is quickly distributed to each layer in the plurality of neural networks for accelerated computation, and the computation speed of the neural networks is improved; and after finishing the accelerated calculation, sending the first data obtained by the accelerated calculation to a vector processing module and/or a storage module according to a preset execution sequence.

Optionally, in some embodiments, the determining manner of the preset execution sequence may specifically include:

acquiring a neural network model file;

compiling the neural network model file through a neural network compiler to obtain a preset execution sequence corresponding to the neural network, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator of the neural network, the arrangement of the weight of the neural network in a memory and the configuration of a data path.

Specifically, the determination mode of the preset execution sequence is determined by a neural network compiler, a neural network of an in-memory computing unit, such as a CNN convolutional neural network, is obtained, the neural network is compiled by the neural network compiler, and a neural network model is obtained, which may be a pb, onnx or tflite file, and includes complete weights in the neural network and operators of each layer; and compiling the neural network model through a neural network compiler to obtain a preset execution sequence of the neural network on the accelerator, wherein the preset execution sequence comprises an execution instruction corresponding to each layer of operator, the arrangement of the weight of the neural network in a memory and the configuration of a data path between each module, and thus, whether the first data is sent to the vector processing module 20 or the storage module 40 is determined according to the preset execution sequence. The neural network compiler can set memory priority or computing priority, the two compiling modes can generate different execution instructions, and corresponding data paths are different.

Optionally, in some embodiments, the neural network acceleration method further includes:

Specifically, the controller 50 is connected to each module of the neural network acceleration device, so as to be responsible for controlling data paths between each module of the neural network acceleration device and configuration information of each module, thereby facilitating subsequent management and configuration of the neural network acceleration device.

Specifically, the memory management module is connected with other modules of the neural network acceleration device and is mainly responsible for dividing and reading and writing the physical memory area of each module. The memory management module may further include an arbiter and a direct memory access; the arbiter is used for determining the priority of data transmission of each module, and the arbiter starts the access of the peripheral/memory according to the priority of the channel request; the direct memory access device is used for generating a memory address and starting a memory read or write cycle.

Optionally, as shown in fig. 5, in some embodiments, step S2 may specifically include:

s21, performing data activation processing on the received first data;

s22, performing maximum pooling and average pooling on the received first data;

s23, performing image up-sampling and down-sampling processing on the received first data;

and S24, carrying out element-by-element addition and subtraction multiplier operation processing on the received first data.

Specifically, for step S2, the activation unit 201 mainly performs data activation processing on the received first data; performing maximum pooling and average pooling on the first data by the pooling unit 202; performing image up-sampling and image down-sampling processing on the first data by the scaling unit 203; addition, subtraction, multiplication, and division operations between the first data elements are performed by the element processing unit 204.

Optionally, in some embodiments, after step S2, further comprising:

and respectively sending the first data after activation processing, the first data after maximum pooling processing and average pooling processing, the first data after image up-sampling and down-sampling processing and the first data after element-by-element addition and subtraction multiplier operation processing to a target position for storage.

After the activation processing, the maximum pooling processing and the average pooling processing are completed, the image up-sampling and down-sampling processing and the element addition and subtraction multiplier operation processing are completed, the first data which are completed with the processing are sent to a target position to be stored, and reading and data processing of subsequent steps are waited.

Optionally, as shown in fig. 6, in some embodiments, step S3 may specifically include;

s31, carrying out data rearrangement on the received second data based on a preset rearrangement rule to obtain rearranged second data;

and S32, based on a preset combination rule, performing data combination on the rearranged second data to obtain corresponding third data.

Specifically, for step S3, the data rearrangement unit 301 performs data rearrangement on the received second data, and the data combination unit 302 performs data combination on the rearranged second data based on a preset combination rule, so as to arrange the second data into third data, for example, arrange vector data into vector data, complete an operation of one window in the neural network, including but not limited to direct connection, combination, rearrangement, and the like, and obtain target data after completing the operations of all windows of all the neural networks.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a neural network acceleration method, including the steps of: initiating a data reading request, and reading original data in a storage module; distributing the original data in the storage module to at least one in-memory computing unit; receiving the distributed original data through a memory computing unit, carrying out accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to a vector processing module and/or a storage module; receiving first data sent by the memory computing module, carrying out vector processing on the first data to obtain corresponding second data, and sending the second data to the tensor processing module and/or the storage module; receiving second data sent by the vector processing module, carrying out tensor processing on the second data to obtain corresponding third data, and sending the third data to the storage module for storage; and storing the original data written through the bus, the first data sent by the memory computing module, the second data sent by the vector processing module and the third data sent by the tensor processing module.

According to the executed neural network acceleration method, the initial data distributed to the neural network are subjected to acceleration calculation through the plurality of memory calculation units, the power consumption of the neural network acceleration device is greatly reduced while the calculation force requirement is met, the energy consumption ratio of the neural network acceleration device is improved, the problem that the power consumption of an existing neural network accelerator is too high is solved, and the application of the neural network accelerator in an edge end processor with low power and high power consumption requirement is expanded.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with an emphasis on the description, and reference may be made to the description of other embodiments for parts that are not described or recited in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A neural network acceleration device, comprising: the device comprises a memory calculation module, a vector processing module and a tensor processing module which are electrically connected in sequence, and a storage module which is electrically connected with the memory calculation module, the vector processing module and the tensor processing module;

the memory computing module comprises a data reading unit and at least one memory computing unit; the in-memory computing unit includes a neural network; the data reading unit is used for initiating a data reading request and reading original data in the storage module; the memory computing module is used for distributing the original data in the storage module to the at least one memory computing unit; the memory computing unit is used for receiving the distributed original data, performing accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to the vector processing module and/or the storage module;

2. The neural network acceleration device of claim 1, wherein the distributing the raw data in the storage module to the at least one in-memory computing unit comprises:

3. The neural network accelerator according to claim 2, wherein the allocating the raw data to the neural network corresponding to each of the in-memory computing units based on the preset execution order comprises:

classifying the received original data according to data types;

4. The neural network acceleration device according to claim 1, wherein the accelerating calculation of the raw data distributed into the neural network, the calculation of the first data and the sending of the first data to the vector processing module and/or the storage module comprises:

5. The neural network acceleration device according to claim 4, wherein the predetermined execution order determination manner includes:

acquiring a neural network model file;

6. The neural network acceleration device according to claim 1, further comprising a controller electrically connected to the memory calculation module, the vector processing module, the tensor processing module, and the storage module, respectively;

7. The neural network acceleration device according to claim 6, further comprising a memory management module, the memory management module being respectively connected to the memory calculation module, the vector processing module, the tensor processing module, the storage module and the controller;

8. The neural network acceleration device of claim 1, wherein the vector processing module comprises an activation unit, a pooling unit, a scaling unit, and an element processing unit;

9. The neural network acceleration device according to claim 1, wherein the tensor processing module includes a data rearranging unit and a data combining unit;

10. A neural network acceleration method is characterized by comprising the following steps:

initiating a data reading request, and reading original data in a storage module; distributing the original data in the storage module to at least one in-memory computing unit; receiving the distributed original data through an internal memory computing unit, carrying out accelerated computing on the original data distributed to the neural network, obtaining first data through computing, and sending the first data to a vector processing module and/or a storage module;

11. The neural network acceleration method of claim 10, wherein the distributing the raw data in the memory module to at least one in-memory computing unit comprises:

12. The neural network acceleration method of claim 11, wherein the allocating the raw data to the neural network corresponding to each of the in-memory computing units based on the preset execution order comprises:

classifying the received original data according to data types;

13. The neural network acceleration method according to claim 10, wherein the performing accelerated computation on the raw data distributed into the neural network, obtaining first data by computation, and sending the first data to the vector processing module and/or the storage module comprises:

14. The neural network acceleration method according to claim 13, wherein the determining manner of the preset execution sequence comprises:

acquiring a neural network model file;

15. The neural network acceleration method according to claim 10, further comprising:

16. The neural network acceleration method of claim 15, further comprising:

17. The neural network acceleration method of claim 10, wherein the vector processing of the first data comprises:

performing data activation processing on the received first data;

performing maximum pooling and/or average pooling on the received first data;

18. The neural network acceleration method of claim 10, wherein the tensor processing the second data comprises:

19. A chip comprising a neural network acceleration device according to any one of claims 1 to 9.

20. An electronic device comprising the neural network accelerating device according to any one of claims 1 to 9.

21. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the neural network acceleration method according to any one of claims 10 to 18.