CN116842991A - Deep learning accelerator, deep learning method, chip, electronic device and storage medium - Google Patents

Deep learning accelerator, deep learning method, chip, electronic device and storage medium Download PDF

Info

Publication number
CN116842991A
CN116842991A CN202210279702.2A CN202210279702A CN116842991A CN 116842991 A CN116842991 A CN 116842991A CN 202210279702 A CN202210279702 A CN 202210279702A CN 116842991 A CN116842991 A CN 116842991A
Authority
CN
China
Prior art keywords
data
module
deep learning
consumed
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210279702.2A
Other languages
Chinese (zh)
Inventor
祝叶华
孙炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202210279702.2A priority Critical patent/CN116842991A/en
Publication of CN116842991A publication Critical patent/CN116842991A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a deep learning accelerator and a method, a chip, electronic equipment and a storage medium, wherein the deep learning accelerator comprises the following components: the device comprises a data generation module, a data consumption module, a communication module and a data buffer area; the communication module and the data buffer area are respectively connected between the data generation module and the data consumption module; the data generation module is used for writing the generated data to be consumed into the data buffer zone and providing indication information for indicating the storage position of the data to be consumed in the data buffer zone for the communication module; the communication module is used for providing the indication information to the data consumption module; and the data consumption module is used for reading the data to be consumed from the data buffer area by using the indication information and consuming the data to be consumed.

Description

Deep learning accelerator, deep learning method, chip, electronic device and storage medium
Technical Field
The embodiment of the application relates to the technical field of data caching, in particular to a deep learning accelerator, a deep learning method, a chip, electronic equipment and a storage medium.
Background
As artificial intelligence networks are explored more and more deeply, the structure of the algorithm network becomes more diversified, and the data interaction between algorithm layers is more complex, so a set of mechanisms for data transmission between layers is required.
Currently, in a deep learning accelerator, a data buffer is generally set for a data generating module and a data consuming module, the data generating module and the data consuming module are respectively connected with the data buffer, the data generating module is responsible for generating data and writing the data into the data buffer, and the data consuming module is responsible for consuming the data and reading the data from the data buffer. The mailbox module is connected between the data generating module and the data consuming module, the data consuming module can only acquire that the data generating module writes data into the data buffer area through the mailbox module, namely effective data exists in the data buffer area and can be consumed, the data access mechanism can only support a scene with continuous physical addresses in the data buffer area, and the data is difficult to selectively read and consume, namely the flexibility of data access is poor.
Disclosure of Invention
The embodiment of the application provides a deep learning accelerator, a deep learning method, a deep learning chip, electronic equipment and a storage medium.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a deep learning accelerator, which comprises the following components: the device comprises a data generation module, a data consumption module, a communication module and a data buffer area;
the communication module and the data buffer area are respectively connected between the data generation module and the data consumption module;
the data generating module is used for writing the generated data to be consumed into the data buffer area and providing indication information for indicating the storage position of the data to be consumed in the data buffer area for the communication module;
the communication module is used for providing the indication information to the data consumption module;
and the data consumption module is used for reading the data to be consumed from the data buffer area by utilizing the indication information and consuming the data to be consumed.
In the deep learning accelerator, the physical addresses in the data buffer are consecutive, and the instruction information includes: a starting physical address of the data to be consumed in the data buffer;
and the data consumption module is used for searching the data to be consumed from the data buffer area by utilizing the initial physical address and reading the data to be consumed.
In the deep learning accelerator, the physical address in the data buffer is discrete, and the indication information includes: the method comprises the steps of starting a physical address of data to be consumed in a data buffer zone, a first virtual address corresponding to the data to be consumed, a first process identifier corresponding to a data generation module and a first module identifier;
the data consumption module is used for:
acquiring a physical address matched with the first module identifier, the first process identifier and the first virtual address in an address mapping table, and determining the physical address as a first physical address;
and searching the data to be consumed from the data buffer zone by utilizing the initial physical address and the first physical address, and reading the data to be consumed.
In the deep learning accelerator, the communication module is a mailbox module or a register.
In the deep learning accelerator, the communication module is a mailbox module, and the deep learning accelerator further comprises a first encoder, a first decoder and a first communication bus;
the first communication bus is used for realizing communication connection among the data generation module, the first encoder, the first decoder and the mailbox module, wherein the first encoder is connected between the data generation module and the first decoder, and the first decoder is connected with the mailbox module;
the data generation module is used for transmitting the indication information and the first communication information provided for the mailbox module to the first encoder;
the first encoder is configured to serially encode the indication information and the first communication information, and transmit first encoded data obtained by encoding to the first decoder in a serial manner;
the first decoder is configured to decode the first encoded data, and transmit the instruction information and the first communication information obtained by decoding to the mailbox module.
In the deep learning accelerator, the communication module is a mailbox module, and the deep learning accelerator further comprises a second encoder, a second decoder and a second communication bus;
the second communication bus is used for realizing communication connection among the data consumption module, the second encoder, the second decoder and the mailbox module, wherein the second encoder is connected between the mailbox module and the second decoder, and the second decoder is connected with the data consumption module;
the mailbox module is used for transmitting the indication information and the second communication information provided by the data consumption module to the second encoder;
the second encoder is configured to serially encode the indication information and the second communication information, and transmit second encoded data obtained by encoding to the second decoder in a serial manner;
the second decoder is configured to decode the second encoded data, and transmit the instruction information and the second communication information obtained by decoding to the data consumption module.
In the deep learning accelerator, the communication module is a register;
the data generation module is used for writing the indication information into the register;
the register is used for providing the indication information to the data consumption module in an interrupt mode.
In the deep learning accelerator, the data consumption module is further configured to:
acquiring a physical address matched with a second process identifier and a second module identifier corresponding to the data consumption module in a preset address mapping table, and determining the physical address as a second physical address;
and searching the data stored on the second physical address from the data buffer area and reading the data.
The embodiment of the application provides a deep learning acceleration method which is applied to a deep learning accelerator, wherein the deep learning accelerator comprises a data generation module, a data consumption module, a communication module and a data buffer zone, wherein the communication module and the data buffer zone are respectively connected between the data generation module and the data consumption module, and the method comprises the following steps:
writing the generated data to be consumed into the data buffer area by utilizing the data generating module, and providing indication information indicating the storage position of the data to be consumed in the data buffer area for the communication module;
providing the indication information to the data consumption module by using the communication module;
and reading the data to be consumed from the data buffer area by using the data consumption module and using the indication information, and consuming the data to be consumed.
The embodiment of the application provides a chip which comprises the deep learning accelerator.
The embodiment of the application provides electronic equipment, which comprises: a deep learning accelerator, a memory for storing a computer program capable of running on the deep learning accelerator, and a communication bus;
the communication bus is used for realizing communication connection between the deep learning accelerator and the memory;
the deep learning accelerator is configured to execute the computer program stored in the memory, so as to implement the above deep learning acceleration method.
An embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program that, when executed, implements the deep learning acceleration method described above.
The embodiment of the application provides a deep learning accelerator, which comprises the following components: the device comprises a data generation module, a data consumption module, a communication module and a data buffer area; the communication module and the data buffer area are respectively connected between the data generation module and the data consumption module; the data generation module is used for writing the generated data to be consumed into the data buffer zone and providing indication information for indicating the storage position of the data to be consumed in the data buffer zone for the communication module; the communication module is used for providing the indication information to the data consumption module; and the data consumption module is used for reading the data to be consumed from the data buffer area by using the indication information and consuming the data to be consumed. According to the deep learning accelerator provided by the embodiment of the application, the communication module is used for providing the indication information for indicating the storage position of the data to be consumed in the data buffer area for the data consumption module, so that the data consumption module can accurately read the data to be consumed from the data buffer area by using the indication information under the condition that the physical address in the data buffer area is continuous or discrete, and the flexibility of data access is improved.
Drawings
Fig. 1 is a schematic structural diagram of a deep learning accelerator according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a configuration of a data buffer according to an embodiment of the present application;
FIG. 3 is a schematic diagram of address mapping of a deep learning accelerator according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a deep learning accelerator according to a second embodiment of the present application;
fig. 5 is a schematic structural diagram III of a deep learning accelerator according to an embodiment of the present application;
fig. 6 is a schematic flow chart of a deep learning acceleration method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail by examples and with reference to the accompanying drawings. The following embodiments may be combined with each other and may not be described in detail in some embodiments for the same or similar concepts or processes.
In addition, the embodiments of the present application may be arbitrarily combined without any collision.
The embodiment of the application provides a deep learning accelerator. Fig. 1 is a schematic structural diagram of a deep learning accelerator according to an embodiment of the present application. As shown in fig. 1, in the embodiment of the present application, the deep learning accelerator 1 includes: a data generation module 10, a data consumption module 11, a communication module 12 and a data buffer 13;
the communication module 12 and the data buffer area 13 are respectively connected between the data generation module 10 and the data consumption module 11;
a data generating module 10 for writing the generated data to be consumed into the data buffer 13 and providing the indication information indicating the storage position of the data to be consumed in the data buffer 13 to the communication module 12;
a communication module 12 for providing the indication information to the data consumption module 11;
the data consumption module 11 is configured to read the data to be consumed from the data buffer 13 by using the indication information, and consume the data to be consumed.
It should be noted that, in the embodiment of the present application, the data generating module 10 and the data consuming module 11 do not establish direct communication, and the data generating module 10 may generate the data to be consumed, so as to write the data into the data buffer 13, and the data consuming module 11 needs to read the data to be consumed from the data buffer 13 for consumption.
It can be understood that, in the embodiment of the present application, the data consuming module 11 needs to know the storage location of the data to be consumed in the data buffer 13 to read the data to be consumed from the data buffer 13, so the data generating module 10 needs to provide the communication module 12 with the indication information indicating the storage location of the data to be consumed in the data buffer 13, and then the indication information is provided to the data consuming module 11 by the communication module 12, so that the data consuming module 11 can use the indication information to realize the reading of the data to be consumed.
Specifically, in the embodiment of the present application, the physical addresses in the data buffer 13 are consecutive, and the indication information includes: a starting physical address of the data to be consumed in the data buffer 13;
the data consumption module 11 is configured to search the data to be consumed from the data buffer 13 by using the initial physical address, and read the data to be consumed.
It should be noted that, in the embodiment of the present application, the data buffer 13 may be a continuous address space in a Static Random-Access Memory (SRAM), that is, physical addresses in the data buffer 13 are continuous, for this scenario, the data consuming module 11 only needs to know the starting physical address of the data to be consumed in the data buffer 13 to actually achieve reading of the data to be consumed, so the indication information provided by the data generating module 10 to the data consuming module 11 through the communication module 12 may only include the starting physical address of the data to be consumed in the data buffer 13.
It will be appreciated that in the embodiment of the present application, the data consumption module 11 may find the starting physical address from the data buffer 13 when obtaining the starting physical address, and start data reading from the starting physical address, so as to obtain the data to be consumed.
Specifically, in the embodiment of the present application, the physical address in the data buffer 13 is discrete, and the indication information includes: the initial physical address of the data to be consumed in the data buffer zone 13, the first virtual address corresponding to the data to be consumed, the first process identifier corresponding to the data generation module 10 and the first module identifier;
a data consumption module 11 for:
acquiring a physical address matched with a first module identifier, a first process identifier and a first virtual address in an address mapping table, and determining the physical address as a first physical address;
and searching the data to be consumed from the data buffer 13 by utilizing the initial physical address and the first physical address, and reading the data to be consumed.
It should be noted that, in the embodiment of the present application, the data buffer 13 may be formed by discrete address spaces in SRAM, as shown in fig. 2, physical addresses in each data buffer are discrete, for this scenario, actually, the data consuming module 11 needs to not only obtain the initial physical address of the data to be consumed in the data buffer 13, but also obtain the first virtual address corresponding to the data to be consumed, the first process identifier corresponding to the data generating module 10, and the first module identifier, so that the storage location of the data to be consumed in the data buffer 13 can be accurately determined, so as to implement reading of the data to be consumed, and therefore, the indication information provided by the data generating module 10 to the data consuming module 11 through the communication module 12 needs to include all the above information.
It should be noted that, in the embodiment of the present application, the data buffer 13 may be formed by discrete address spaces in the SRAM, that is, share different address spaces in one SRAM, so for the data generating module 10 and the data consuming module 11, an address mapping table is required, where the discrete physical address spaces need to be connected into a continuous virtual address, and each time the data generating module 10 sends a write data operation to the data buffer 13 or the data consuming module 11 sends a read data operation to the data buffer 13, the continuous virtual address is mapped to the discrete physical address through the address mapping table first, so that, in the data buffer 13, starting with the initial physical address, a matching discrete physical address, that is, the first physical address is searched for, so as to find the data to be consumed, as shown in fig. 3.
It should be noted that, in the embodiment of the present application, when the data generating module 10 writes the data to be consumed into the data buffer area 13, the physical address matching the first module identifier, the first process identifier and the first virtual address in the address mapping table is also obtained, that is, the first physical address is obtained, and then the data to be consumed is written into the first physical address.
Illustratively, the address mapping table is as shown in Table 1 below:
TABLE 1
It should be noted that, in the embodiment of the present application, the data generating module 10 and the data consuming module 11 use a set of identical address mapping tables. Wherein different module identifiers are used to distinguish between the data generation module 10 and the data consumption module 11, and different process identifiers are used to distinguish between different processes running. The data generation module 10 and the data consumption module 11 are different from each other in the physical address matched by the same virtual address, and therefore, different module identifiers and different process identifiers need to be used for distinguishing.
It should be noted that, in the embodiment of the present application, the communication module 12 may be a different type of module, which will be described in detail below.
Fig. 4 is a schematic structural diagram of a deep learning accelerator according to an embodiment of the present application. As shown in fig. 4, the communication module 12 is a mailbox module 120, and the deep learning accelerator 1 further includes a first encoder 14, a first decoder 15, and a first communication bus 16;
a first communication bus 16 for implementing communication connection among the data generating module 10, the first encoder 14, the first decoder 15 and the mailbox module 120, wherein the first encoder 14 is connected between the data generating module 10 and the first decoder 15, and the first decoder 15 is connected with the mailbox module 120;
a data generation module 10 for transmitting the indication information, and the first communication information provided for the mailbox module 120, to the first encoder 14;
a first encoder 14 for serially encoding the indication information and the first communication information, and transmitting the encoded first encoded data to the first decoder 15 in a serial manner;
the first decoder 15 is configured to decode the first encoded data, and transmit the instruction information and the first communication information obtained by decoding to the mailbox module 120.
It should be noted that, in the embodiment of the present application, the different modules may be arranged near or far in the deep learning accelerator 1, and in the case that the data generating module 10 is far from the data buffer 13 and the mailbox module 120, considering that the transmission distance of the data line is far, if one data link is separately designed for transmitting the indication information from the data generating module 10 to the mailbox module 120, it is relatively complex in the later wiring, so a logic may be added at the exit end of the data generating module 10, so that the first communication information needed to be provided for the mailbox module 120, as shown in fig. 4, may be a signal p_num indicating the data amount of the data to be consumed, is serialized with the indication information, and is transmitted in a serial manner, that is, transmitted in a sequential bit-by-bit difference transmission manner, thereby avoiding separately designing the link for transmitting the indication information. For example, the sum of the data width of the first communication information and the indication information is 32 bits, so that transmission is not required through a data line with the bit width of 32 bits, but serial output is performed through a data line with 1bit, and correspondingly, a decoder is required to decode, and finally, the data is restored to 32 bits.
It should be noted that, in the embodiment of the present application, the signal p_sts of the data storage condition of the data buffer 13 provided to the data generating module 10 by the mailbox module 120 may also be notified to the data generating module 10 by means of encoding and then decoding, so that an encoder may be designed to be connected between the mailbox module 120 and a decoder and connect the decoder to the data generating module 10.
It should be noted that, in the embodiment of the present application, as shown in fig. 4, when data needs to be written into the data buffer, the data occupancy state of the data buffer needs to be checked by the p_sts signal, and the p_sync signal is sent to inform that the data needs to be written into the data buffer, the p_num signal is used to inform that the data size of the data buffer is to be written into, when the data needs to be consumed, the data consumption module is responsible for the consumption of the data, and the c_st signal is checked first to determine whether enough data is provided to the data consumption module for consumption, and the data consumption module sends out the c_sync signal to inform that the data needs to be consumed from the data buffer, and sends out the c_num signal to inform that the consumed data size is to be consumed.
In the embodiment of the present application, the communication module 12 is the mailbox module 120, and the deep learning accelerator 1 may further include a second encoder (not shown in the figure), a second decoder (not shown in the figure), and a second communication bus (not shown in the figure);
a second communication bus, configured to implement communication connection between the data consumption module 11, a second encoder, a second decoder, and the mailbox module 120, where the second encoder is connected between the mailbox module 120 and the second decoder, and the second decoder is connected with the data consumption module 11;
a mailbox module 120 for transmitting the indication information and the second communication information to the second encoder;
the second encoder is used for serially encoding the indication information and the second communication information and transmitting second encoded data obtained by encoding to the second decoder in a serial mode;
and the second decoder is configured to decode the second encoded data, and transmit the instruction information and the second communication information obtained by decoding to the data consumption module 11.
It will be appreciated that, in the embodiment of the present application, similar to the principle that the data generating module 10 is far from the data buffer 13 and the mailbox module 120, in the case that the data consuming module 11 is far from the data buffer 13 and the mailbox module 120, a separate data link may be avoided from being designed to transmit the indication information from the mailbox module 120 to the data consuming module 11 by adding the second encoder, the second decoder and the second communication bus, which will not be described herein.
It should be noted that, in the embodiment of the present application, as shown in fig. 4, the second communication information may be the signal c_sts, and of course, may also be other information that needs to be transmitted during the communication interaction between the mailbox module 120 and the data consumption module 11.
Specifically, in the embodiment of the present application, the communication module 12 is a register 121;
a data generation module 10 for writing the instruction information into the register 121;
a register 121 for providing the indication information to the data consuming module 11 by means of an interrupt.
It will be appreciated that in an embodiment of the present application, if the data generating module 10 generates data to be consumed for consumption by the data consuming module 11, the data generating module 10 may also provide the indication information to the data consuming module 11 through the register 121. Specifically, the data generating module 10 may write the indication information into the register 121 and inform the data consuming module 11 by means of an interrupt.
In the embodiment of the present application, as shown in fig. 5, when the communication module 12 is a register 121, the deep learning accelerator 1 may also include a mailbox module 120, where the mailbox module 120 is only used to send and receive communication information to and from the data generating module 10 and the data consuming module 11, that is, to interact with the data generating module 10 by performing p_sts signals, p_sync signals, and p_num signals, and interact with the data consuming module 11 by performing c_st signals, c_sync signals, and c_num signals.
Specifically, in the embodiment of the present application, the data consumption module 11 is further configured to:
acquiring a physical address matched with a second process identifier corresponding to the data consumption module 11 and a second module identifier in a preset address mapping table, and determining the physical address as a second physical address;
the data stored at the second physical address is looked up from the data buffer 13 and read.
It should be noted that, in the embodiment of the present application, the data consuming module 11 may be divided into two cases when consuming data, where the first case is the data to be consumed that has been written into the data buffer 13 by the consuming data generating module 10, and the second case is the other data in the consuming data buffer 13. For the second case, the data consumption module 11 may determine the matched physical address, i.e. the second physical address, by using its own process identifier and module identifier, i.e. the second process identifier and the second module identifier, through the address mapping table, and normally read the data in the data buffer 13.
It should be noted that, in the embodiment of the present application, the communication module 12 may be the mailbox module 120 or the register 121, and of course, may be other devices, which is not limited by the embodiment of the present application.
The embodiment of the application provides a deep learning accelerator, which comprises the following components: the device comprises a data generation module, a data consumption module, a communication module and a data buffer area; the communication module and the data buffer area are respectively connected between the data generation module and the data consumption module; the data generation module is used for writing the generated data to be consumed into the data buffer zone and providing indication information for indicating the storage position of the data to be consumed in the data buffer zone for the communication module; the communication module is used for providing the indication information to the data consumption module; and the data consumption module is used for reading the data to be consumed from the data buffer area by using the indication information and consuming the data to be consumed. According to the deep learning accelerator provided by the embodiment of the application, the communication module is used for providing the indication information for indicating the storage position of the data to be consumed in the data buffer area for the data consumption module, so that the data consumption module can accurately read the data to be consumed from the data buffer area by using the indication information under the condition that the physical address in the data buffer area is continuous or discrete, and the flexibility of data access is improved.
The embodiment of the application provides a deep learning acceleration method which is applied to the deep learning accelerator 1. Fig. 6 is a flow chart of a deep learning acceleration method according to an embodiment of the present application. As shown in fig. 5, the deep learning acceleration method mainly includes the steps of:
s601, writing the generated data to be consumed into a data buffer by utilizing a data generating module, and providing indication information indicating the storage position of the data to be consumed in the data buffer to a communication module.
In an embodiment of the present application, referring to fig. 1, the deep learning accelerator 1 includes a data generating module 10, a data consuming module 11, a communication module 12, and a data buffer 13, the communication module 12 and the data buffer 13 being respectively connected between the data generating module 10 and the data consuming module 11, the deep learning accelerator 1 may write the generated data to be consumed into the data buffer 13 using the data generating module 10, and provide indication information indicating a storage location of the data to be consumed in the data buffer 13 to the communication module 12.
It should be noted that, in the embodiment of the present application, in the case where the physical addresses in the data buffer 13 are consecutive, the instruction information includes: a starting physical address of the data to be consumed in the data buffer 13; the physical address in the data buffer 13 is discrete, and the indication information includes: the initial physical address of the data to be consumed in the data buffer 13, the first virtual address corresponding to the data to be consumed, the first process identifier corresponding to the data generating module 10, and the first module identifier.
S602, providing the indication information to the data consumption module by using the communication module.
In the embodiment of the present application, the deep learning accelerator 1 provides the instruction information to the communication module 12 using the data generation module 10, and then the instruction information may be provided to the data consumption module 11 using the communication module 12.
S603, reading the data to be consumed from the data buffer area by using the data consumption module and consuming the data to be consumed by using the indication information.
In the embodiment of the present application, in the deep learning accelerator 1, the data consumption module 11 may obtain the indication information provided by the communication module 12, so that the data consumption module 11 reads the data to be consumed from the data buffer 13 by using the indication information, and consumes the data to be consumed.
Specifically, in the embodiment of the present application, the physical addresses in the data buffer 13 are consecutive, and the indication information includes: a starting physical address of the data to be consumed in the data buffer 13; the deep learning accelerator 1 may utilize the data consumption module 11 to search the data to be consumed from the data buffer 13 by using the initial physical address, and perform reading of the data to be consumed.
Specifically, in the embodiment of the present application, the physical address in the data buffer 13 is discrete, and the indication information includes: the initial physical address of the data to be consumed in the data buffer zone 13, the first virtual address corresponding to the data to be consumed, the first process identifier corresponding to the data generation module 10 and the first module identifier; the deep learning accelerator 1 may acquire, by using the data consumption module 11, a physical address matching the first module identifier, the first process identifier, and the first virtual address in the address mapping table, determine the physical address as the first physical address, and then, search, by using the starting physical address and the first physical address, data to be consumed from the data buffer 13, and read the data to be consumed.
The embodiment of the application also provides a chip which comprises the deep learning accelerator 1, so that the corresponding deep learning acceleration method is supported to be executed.
The embodiment of the application also provides electronic equipment. Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device includes: the deep learning accelerator 1 described above, a memory 2 for storing a computer program capable of running on the deep learning accelerator 1, and a communication bus 3;
the communication bus 3 is used for realizing communication connection between the deep learning accelerator 1 and the memory 2;
the deep learning accelerator 1 is configured to execute the computer program stored in the memory 2 to implement the above deep learning acceleration method.
Embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program that, when executed, implements the deep learning acceleration method described above. The computer readable storage medium may be a volatile Memory (RAM), such as Random-Access Memory (RAM); or a nonvolatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD) or a Solid State Drive (SSD); but also a respective device comprising one or any combination of the above memories, such as a mobile phone, a computer, a tablet device, a personal digital assistant
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block and/or flow of the flowchart illustrations and/or block diagrams, and combinations of blocks and/or flow diagrams in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A deep learning accelerator, comprising: the device comprises a data generation module, a data consumption module, a communication module and a data buffer area;
the communication module and the data buffer area are respectively connected between the data generation module and the data consumption module;
the data generating module is used for writing the generated data to be consumed into the data buffer area and providing indication information for indicating the storage position of the data to be consumed in the data buffer area for the communication module;
the communication module is used for providing the indication information to the data consumption module;
and the data consumption module is used for reading the data to be consumed from the data buffer area by utilizing the indication information and consuming the data to be consumed.
2. The deep learning accelerator of claim 1, wherein physical addresses in the data buffer are contiguous, the indication information comprising: a starting physical address of the data to be consumed in the data buffer;
and the data consumption module is used for searching the data to be consumed from the data buffer area by utilizing the initial physical address and reading the data to be consumed.
3. The deep learning accelerator of claim 1, wherein physical addresses in the data buffer are discrete, the indication information comprising: the method comprises the steps of starting a physical address of data to be consumed in a data buffer zone, a first virtual address corresponding to the data to be consumed, a first process identifier corresponding to a data generation module and a first module identifier;
the data consumption module is used for:
acquiring a physical address matched with the first module identifier, the first process identifier and the first virtual address in an address mapping table, and determining the physical address as a first physical address;
and searching the data to be consumed from the data buffer zone by utilizing the initial physical address and the first physical address, and reading the data to be consumed.
4. A deep learning accelerator as claimed in any one of claims 1 to 3 wherein the communications module is a mailbox module or a register.
5. The deep learning accelerator of claim 1, wherein the communication module is a mailbox module, the deep learning accelerator further comprising a first encoder, a first decoder, and a first communication bus;
the first communication bus is used for realizing communication connection among the data generation module, the first encoder, the first decoder and the mailbox module, wherein the first encoder is connected between the data generation module and the first decoder, and the first decoder is connected with the mailbox module;
the data generation module is used for transmitting the indication information and the first communication information provided for the mailbox module to the first encoder;
the first encoder is configured to serially encode the indication information and the first communication information, and transmit first encoded data obtained by encoding to the first decoder in a serial manner;
the first decoder is configured to decode the first encoded data, and transmit the instruction information and the first communication information obtained by decoding to the mailbox module.
6. The deep learning accelerator of claim 1 or 5, wherein the communication module is a mailbox module, the deep learning accelerator further comprising a second encoder, a second decoder, and a second communication bus;
the second communication bus is used for realizing communication connection among the data consumption module, the second encoder, the second decoder and the mailbox module, wherein the second encoder is connected between the mailbox module and the second decoder, and the second decoder is connected with the data consumption module;
the mailbox module is used for transmitting the indication information and the second communication information provided by the data consumption module to the second encoder;
the second encoder is configured to serially encode the indication information and the second communication information, and transmit second encoded data obtained by encoding to the second decoder in a serial manner;
the second decoder is configured to decode the second encoded data, and transmit the instruction information and the second communication information obtained by decoding to the data consumption module.
7. The deep learning accelerator of claim 1, wherein the communication module is a register;
the data generation module is used for writing the indication information into the register;
the register is used for providing the indication information to the data consumption module in an interrupt mode.
8. The deep learning accelerator of claim 1,
the data consumption module is further configured to:
acquiring a physical address matched with a second process identifier and a second module identifier corresponding to the data consumption module in a preset address mapping table, and determining the physical address as a second physical address;
and searching the data stored on the second physical address from the data buffer area and reading the data.
9. A deep learning acceleration method, which is applied to a deep learning accelerator, wherein the deep learning accelerator comprises a data generation module, a data consumption module, a communication module and a data buffer zone, and the communication module and the data buffer zone are respectively connected between the data generation module and the data consumption module, and the method comprises the following steps:
writing the generated data to be consumed into the data buffer area by utilizing the data generating module, and providing indication information indicating the storage position of the data to be consumed in the data buffer area for the communication module;
providing the indication information to the data consumption module by using the communication module;
and reading the data to be consumed from the data buffer area by using the data consumption module and using the indication information, and consuming the data to be consumed.
10. A chip comprising a deep learning accelerator as claimed in any one of claims 1 to 8.
11. An electronic device, comprising: a deep learning accelerator, a memory for storing a computer program capable of running on the deep learning accelerator, and a communication bus;
the communication bus is used for realizing communication connection between the deep learning accelerator and the memory;
the deep learning accelerator for executing the computer program stored in the memory to implement the deep learning acceleration method of claim 9.
12. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the deep learning acceleration method of claim 9.
CN202210279702.2A 2022-03-21 2022-03-21 Deep learning accelerator, deep learning method, chip, electronic device and storage medium Pending CN116842991A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210279702.2A CN116842991A (en) 2022-03-21 2022-03-21 Deep learning accelerator, deep learning method, chip, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210279702.2A CN116842991A (en) 2022-03-21 2022-03-21 Deep learning accelerator, deep learning method, chip, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN116842991A true CN116842991A (en) 2023-10-03

Family

ID=88169317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210279702.2A Pending CN116842991A (en) 2022-03-21 2022-03-21 Deep learning accelerator, deep learning method, chip, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN116842991A (en)

Similar Documents

Publication Publication Date Title
CN101763895B (en) Data storage device and data storage system having randomizer/de-randomizer
RU2004128074A (en) DIRECT MEMORY SWAPING BETWEEN NAND FLASH MEMORY AND SRAM CODED WITH ERROR CORRECTION
KR102372972B1 (en) Memory addressing methods and related controllers, memory devices and hosts
US20220091780A1 (en) Memory access method and intelligent processing apparatus
CN100508604C (en) Arithmetic coding circuit and arithmetic coding control method
US20110004817A1 (en) Crc management method performed in sata interface and data storage device using crc management method
US5895496A (en) System for an method of efficiently controlling memory accesses in a multiprocessor computer system
CN116842991A (en) Deep learning accelerator, deep learning method, chip, electronic device and storage medium
CN115883022B (en) DMA transmission control method, apparatus, electronic device and readable storage medium
CN110633225B (en) Apparatus and method for generating entity storage comparison table
CN103368944A (en) Memory shared network architecture and protocol specifications for same
CN106940684B (en) Method and device for writing data according to bits
CN113222807B (en) Data memory, data storage method, data reading method, chip and computer equipment
CN110720126B (en) Method for transmitting data mask, memory controller, memory chip and computer system
CN101488119B (en) Address interpretation method, apparatus and single-board
CN103389924A (en) ECC (Error Correction Code) storage system applied to random access memory
CN114253870A (en) Method, system, device and medium for updating L2P table
CN111209221B (en) Storage system
CN115963977A (en) Solid state disk, data operation method and device thereof, and electronic device
CN103095510A (en) Multifunction vehicle bus analytical equipment
CN107346271A (en) The method and calamity of Backup Data block are for end equipment
CN115827284B (en) System on chip, electronic component, electronic device, and transmission processing method
CN110781100B (en) Data detection method, logic chip and network equipment
CN113162728B (en) Polarized Polar coding method and device, electronic equipment and storage medium
CN117079703B (en) Method and device for testing embedded memory of chip and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination