WO2022012563A1 - 神经网络数据处理方法、装置、设备及存储介质 - Google Patents

神经网络数据处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022012563A1
WO2022012563A1 PCT/CN2021/106147 CN2021106147W WO2022012563A1 WO 2022012563 A1 WO2022012563 A1 WO 2022012563A1 CN 2021106147 W CN2021106147 W CN 2021106147W WO 2022012563 A1 WO2022012563 A1 WO 2022012563A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
data
current layer
operation data
loading
Prior art date
Application number
PCT/CN2021/106147
Other languages
English (en)
French (fr)
Inventor
伍永情
黄炯凯
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Publication of WO2022012563A1 publication Critical patent/WO2022012563A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the embodiments of the present application relate to the technical field of deep learning, for example, to a neural network data processing method, apparatus, device, and storage medium.
  • deep learning As a kind of neural network, deep learning is motivated by establishing and simulating the human brain for analysis and learning. Due to the fast and efficient processing of large amounts of data, the application of deep learning has become more and more extensive.
  • Neural network operations usually require a large amount of data.
  • data is stored in off-chip memory modules.
  • the chip performs inference operations, it needs to move data from off-chip memory modules.
  • the computing engine reads data from the on-chip storage module to perform deep learning operations.
  • data handling and data operation are serial, and the calculation engine usually waits for the operation data required by the current layer of the neural network to be moved from the off-chip memory module to the on-chip memory module before starting the calculation.
  • the data transfer of the next layer of the neural network needs to wait until the computing engine of the current layer of the neural network has completed the operation, that is, when the data transfer is performed, the computing engine is in an idle state, and when the computing engine is operating, the data transfer is in an idle state , the data throughput rate of this data processing method is low, which makes the neural network operation take a long time.
  • the embodiments of the present application provide a neural network data processing method, apparatus, device, and storage medium, so as to reduce the overall time required for the neural network operation and improve the operation efficiency of the neural network.
  • an embodiment of the present application provides a neural network data processing method, including:
  • the first operation data of the next layer of the neural network is loaded.
  • an embodiment of the present application provides a neural network data processing device, including:
  • the second operation data loading module is set to load the second operation data of the current layer of the neural network
  • the first operation data loading module is configured to load the first operation data of the next layer of the neural network in response to the completion of loading the second operation data of the current layer of the neural network.
  • an embodiment of the present application provides a neural network data processing device, including:
  • processors one or more processors
  • the one or more processors are configured to execute the one or more programs to implement the neural network data processing method provided by any embodiment of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the neural network data processing method provided by any embodiment of the present application is implemented.
  • FIG. 1 is a schematic flowchart of a neural network data processing method provided in Embodiment 1 of the present application;
  • FIG. 2A is a schematic flowchart of a neural network data processing method according to Embodiment 2 of the present application.
  • FIG. 2B is a schematic time sequence diagram of the neural network data processing method provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic structural diagram of a neural network data processing apparatus provided in Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of a neural network data processing device according to Embodiment 4 of the present application.
  • first, second, etc. may be used herein to describe various directions, acts, steps or elements, etc., but are not limited by these terms. These terms are only used to distinguish a first direction, act, step or element from another direction, act, step or element.
  • first operational data may be referred to as second operational data
  • second operational data may be referred to as first operational data, without departing from the scope of this application.
  • Both the first operation data and the second operation data are operation data, but are not the same operation data.
  • the terms “first”, “second”, etc. should not be understood as indicating or implying relative importance or implying the number of indicated technical features.
  • a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • plural and “batch” mean at least two, such as two, three, etc., unless expressly defined otherwise.
  • FIG. 1 is a schematic flowchart of a neural network data processing method according to Embodiment 1 of the present application. This embodiment can be applied to data processing in the inference process of a deep learning neural network chip. As shown in FIG. 1 , the neural network data processing method provided in Embodiment 1 of the present application includes:
  • a neural network is a neural network that simulates the human brain in order to realize the algorithm model of artificial intelligence.
  • the neural network usually includes an input layer, a hidden layer and an output layer. Each layer contains a large number of computing nodes. Among them, the hidden layer There are a large number of , in the operation, usually the operation of the next layer of the neural network is performed after the operation of one layer of the neural network is completed.
  • the second operation data of the neural network refers to the operation map data (feature map data) of the neural network.
  • the operation map data is usually generated according to the data input by the user into the neural network. For example, if the user inputs a picture, then the operation map The data is the data obtained after the relevant processing of the picture data, such as feature data extracted from the picture data.
  • the operation graph data of each layer of the neural network is different, so when each layer of the neural network performs operations, the second operation data of the current layer of the neural network must be loaded.
  • the data loading mode is a DMA (Direct Memory Access, direct memory access) transmission mode
  • the DMA includes RDMA (Read DMA, DMA reads data) and WDMA (Write DMA, DMA writes data).
  • Load the second operation data of the current layer of the neural network that is, read the second operation data in the off-chip storage module into the on-chip cache module through DMA
  • the off-chip storage module is such as off-chip DDR (Double Data Rate, Double rate) memory is usually referred to as off-chip DDR
  • on-chip cache (Data Buffer) modules such as on-chip RAM (Random Access Memory, random access memory).
  • the second operation data of the current layer of the neural network can be loaded through RDMA.
  • the first operation data of the neural network refers to the weight data (coefficient data) and the bias data (bias data) of the neural network, and each layer of the neural network has corresponding first operation data.
  • the operation of each layer of the neural network is completed by the neural network calculation engine (Engine).
  • the second operation data of the current layer of the neural network is loaded, for example, when the second operation data of the current layer of the neural network is loaded through RDMA, the calculation engine will start synchronously.
  • the current layer of the neural network is operated, but the time required for the operation of the current layer of the neural network is usually greater than the time required to load the second operation data of the current layer of the neural network.
  • the computing engine is still operating, and at this time, the first operation data of the next layer of the neural network is immediately loaded, for example, through RDMA, so that the data loading is not in an idle state, that is, the data operation of the current layer of the neural network is the same as that of the next layer of the neural network.
  • the loading of the first operation data is performed synchronously, that is, the operation of the neural network computing engine is processed in parallel with the data reading of the DMA, thereby reducing the idle time of the computing engine and data reading, and improving the data in the neural network operation process. throughput.
  • the neural network data processing method provided in the first embodiment of the present application loads the second operation data of the current layer of the neural network; in response to the completion of loading the second operation data of the current layer of the neural network, loads the first operation data of the next layer of the neural network,
  • the operation of the neural network computing engine is processed in parallel with the data reading of the DMA, which improves the parallelism between the computing engine and the DMA, thereby reducing the idle time of the computing engine and data reading, and improving the data throughput during the neural network operation. , reducing the overall time required for neural network operations.
  • FIG. 2A is a schematic flowchart of a neural network data processing method according to Embodiment 2 of the present application. As shown in FIG. 2A , the neural network data processing method provided in the second embodiment of the present application includes:
  • the operation of each layer of the neural network is that the neural network computing engine (Engine) performs operations on the first operation data and the second operation data of each layer of the neural network.
  • the neural network performs data loading and operation, it usually loads the first operation data of the current layer of the neural network first, and then loads the second operation data of the current layer of the neural network after the first operation data of the current layer of the neural network is loaded.
  • the calculation engine runs synchronously, performs operations according to the first operation data and the second operation data of the current layer of the neural network that have been loaded, and outputs the current layer of the neural network. Operation result data, the operation result data output by the computing engine is cached in the on-chip RAM.
  • the preloaded first operation data of the current layer of the neural network is when the computing engine is in an idle state, that is, the computing engine does not perform inference operations, Read into on-chip RAM by RDMA. If the current layer of the neural network is not the first layer of the neural network, then the pre-loaded first operation data of the current layer of the neural network is the second operation data of the upper layer of the neural network. When the inference operation is being performed on the data of the upper layer of the neural network, it is read into the on-chip RAM by RDMA.
  • the time required for the computation of the neural network computing engine is usually greater than the time required to load the second operation data of the current layer of the neural network
  • the first layer of the next layer of the neural network is loaded.
  • Operation data, and, according to the pre-loaded first operation data of the current layer of the neural network and the second operation data of the current layer of the neural network to perform operations simultaneously, that is, steps S220 and S230 are performed synchronously, that is, the RDMA is synchronized with the computing engine run.
  • the computing engine After the computing engine outputs the operation result data of the current layer of the neural network to the on-chip RAM, it needs to store the operation result data in the on-chip RAM to the off-chip DDR through WDMA.
  • step S250 After the execution of step S250 is completed, return to step S210 until the loading of the second operation data of all layers of the neural network is completed.
  • the operation result data of the current layer of the neural network when the operation result data of the current layer of the neural network is stored, it means that the data operation of the current layer of the neural network is completed, and the operation of the next layer of the neural network needs to be performed. At this time, the next layer of the neural network is used as the current layer of the neural network.
  • the second operation data of the next layer of the neural network is loaded, until the loading of the second operation data of all layers of the neural network is completed, that is, until the operation of the neural network is completed.
  • the computing engine can quickly The second operation data is operated with the already loaded first operation data of the next layer of the neural network, that is, the computing engine does not need to wait for the first operation data of the next layer of the neural network to be loaded, thereby reducing the time when the computing engine is in an idle state.
  • FIG. 2B is a schematic time sequence diagram of the neural network data processing method provided in Embodiment 2 of the present application.
  • the current layer of the neural network is the first layer of the neural network.
  • RDMA first loads the first operation data of the current layer of the neural network.
  • the first operation data of the current layer of the neural network includes the first bias data 21_1 and the first weight data 22_1.
  • the first bias data 21_1 can be recorded as 1-bias
  • the first weight data 22_1 can be recorded as 1-coeff
  • the second operation data 23_1 of the current layer of the neural network is loaded, and the second operation data 23_1 is, for example, the first operation map data 1-feature map data; calculation engine (Engine)
  • the calculation operation is performed according to the loaded first calculation data and second calculation data of the current layer of the neural network.
  • This calculation operation 24_1 can be recorded as the first calculation 1-compute, and the calculation result data of the current layer of the neural network is output.
  • the operation result data of the current layer is stored in the off-chip DDR, and this data storage operation 25_1 may be recorded as the first data storage 1-output.
  • RDMA immediately loads the first operation data of the next layer of the neural network, and the first operation data of the next layer of the neural network includes the second offset data 21_2 and the second weight data 22_2, the second bias data 21_2 can be recorded as 2-bias, and the second weight data 22_2 can be recorded as 2-coeff. It can be seen from FIG. 2B that the RDMA, the calculation engine and the WDMA are all in Running state, that is, the three are running in parallel.
  • WDMA When WDMA stores the operation result data of the current layer of the neural network, it means that the data operation of the current layer of the neural network is completed, and the data operation of the next layer of the neural network needs to be performed, and the first operation data of the next layer of the neural network has been pre-loaded.
  • the second operation data 23_2 of the next layer of the neural network can be directly loaded, and the second operation data 23_2 is, for example, the second operation map data 2-feature map data, then the calculation engine can quickly load the next layer of the neural network according to the The first operation data and the second operation data are calculated, and this operation 24_2 can be recorded as the second operation 2-comput, and WDMA stores the operation result data of the next layer of the neural network in the off-chip DDR, and this data storage Operation 25_2 may be denoted as second data store 2-output.
  • the first operation data of the third layer of the neural network can be loaded immediately, and the first operation data of the third layer of the neural network includes the third bias
  • the data 21_3 and the third weight data 22_3, the third bias data 21_3 can be recorded as 3-bias, and the third weight data 22_3 can be recorded as 3-coeff.
  • the data is loaded. This greatly reduces the time that the computing engine is in an idle state, thereby reducing the overall time-consuming of neural network operations.
  • the neural network data processing method provided in the second embodiment of the present application enables the operation of the neural network computing engine and the data reading of the DMA to be processed in parallel, thereby improving the parallelism between the computing engine and the DMA, thereby reducing the idle time of the computing engine and the DMA, and improving the The data throughput rate in the neural network operation process is reduced, the overall time-consuming required for the neural network operation is reduced, and the operation efficiency of the neural network is improved.
  • FIG. 3 is a schematic structural diagram of a neural network data processing apparatus provided in Embodiment 3 of the present application, and the embodiment of the present application can be applied to data processing in the inference process of a deep learning neural network chip.
  • the neural network data processing apparatus provided in this embodiment of the present application can implement the neural network data processing method provided in any embodiment of the present application, and has the corresponding functional structure and beneficial effects of the implementation method. Description of method embodiments.
  • the neural network data processing apparatus includes: a second operation data loading module 310 and a first operation data loading module 320, wherein:
  • the second operation data loading module 310 is configured to load the second operation data of the current layer of the neural network
  • the first operation data loading module 320 is configured to load the first operation data of the next layer of the neural network in response to the completion of loading the second operation data of the current layer of the neural network.
  • the neural network data processing apparatus further includes:
  • a data operation module configured to perform operations according to the preloaded first operation data of the current layer of the neural network and the second operation data of the current layer of the neural network to obtain operation result data of the current layer of the neural network;
  • a data storage module configured to store the operation result data of the current layer of the neural network.
  • it also includes:
  • the loop module is configured to, in response to the completion of storing the operation result data of the current layer of the neural network, take the next layer of the neural network as the current layer of the neural network, and return to the step of loading the second operation data of the current layer of the neural network until the neural network is completed Loading of second operation data for all layers.
  • the second operation data of the current layer of the neural network is operation graph data of the current layer of the neural network.
  • the first operation data of the current layer of the neural network is weight data and bias data of the current layer of the neural network.
  • the data loading method is a DMA transmission method.
  • the neural network data processing device uses the second operation data loading module and the first operation data loading module, so that the operation of the neural network computing engine and the data reading of the DMA are processed in parallel, and the connection between the computing engine and the DMA is improved. Parallelism, thereby reducing the idle time of the computing engine and data reading, improving the data throughput rate during the neural network operation, and reducing the overall time required for the neural network operation.
  • FIG. 4 is a schematic structural diagram of a neural network data processing device according to Embodiment 4 of the present application.
  • FIG. 4 shows a block diagram of an exemplary neural network data processing device 412, referred to as device 412, suitable for use in implementing embodiments of the present application.
  • the device 412 shown in FIG. 4 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • device 412 takes the form of a generic device.
  • Components of device 412 may include: one or more processors 416 , storage device 428 , and bus 418 connecting various system components, such as storage device 428 and processor 416 .
  • the bus 418 represents one or more of several types of bus structures, including a storage device bus or storage device controller, a peripheral bus, a graphics acceleration port, or a local bus using any of a variety of bus structures.
  • these architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) ) local bus and peripheral component interconnect (Peripheral Component Interconnect, PCI) bus and so on.
  • Device 412 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by device 412, including volatile and non-volatile media, removable and non-removable media.
  • Storage 428 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache 432 .
  • Device 412 may include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 434 may be used to read and write to non-removable, non-volatile magnetic media, not shown in FIG. 4, commonly referred to as hard drives. Not shown in FIG.
  • a magnetic disk drive for reading and writing to removable non-volatile magnetic disks, such as floppy disks, and removable non-volatile optical disks, such as Compact Disc Read-Only Memory (CD), may be provided -ROM), Digital Video Disc (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical media read and write optical disc drives.
  • each drive may be connected to bus 418 through one or more data media interfaces.
  • the storage device 428 may include at least one program product having a set of, eg, at least one program module configured to perform the functions of the embodiments of the present application.
  • a program/utility 440 having, for example, a set of at least one program module 442, which may be stored, for example, in storage device 428, such program module 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment.
  • Program modules 442 generally perform the functions and/or methods of the embodiments described herein.
  • the device 412 may also communicate with one or more external devices 414, such as a keyboard, pointing terminal, display 424, etc., the device 412 may also communicate with one or more terminals that enable a user to interact with the device 412, and/or Or with any terminal that enables the device 412 to communicate with one or more other computing terminals, such as a network card, modem, and the like. Such communication may take place through input/output (I/O) interface 422 . Also, the device 412 may communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as a public network, through a network adapter 420 Internet. As shown in FIG.
  • LAN Local Area Network
  • WAN Wide Area Network
  • public network such as a public network
  • network adapter 420 communicates with other modules of device 412 via bus 418 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with device 412, including: microcode, terminal drivers, redundant processors, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives and data backup storage systems, etc.
  • the processor 416 executes at least one functional application and data processing by running the program stored in the storage device 428, such as implementing the neural network data processing method provided by any embodiment of the present application, and the method may include:
  • the first operation data of the next layer of the neural network is loaded.
  • Embodiment 5 of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the neural network data processing method provided by any embodiment of the present application is implemented , the method can include:
  • the first operation data of the next layer of the neural network is loaded.
  • the computer storage medium of the embodiments of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Computer readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory) ), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing, to name a few.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium having computer-readable program code embodied in it may include a data signal propagated in baseband or as part of a carrier wave. Such propagated data signals may take a variety of forms, including electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • RF radio frequency
  • Computer program code for carrying out the operations of the present application may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as C or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or terminal.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider .
  • LAN local area network
  • WAN wide area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络数据处理方法、装置、设备及存储介质,所述方法包括:加载神经网络当前层的第二运算数据(S110);响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据(S120)。

Description

神经网络数据处理方法、装置、设备及存储介质
本公开要求在2020年07月15日提交中国专利局、申请号为202010679561.4的中国专利申请的优先权,以上申请的全部内容通过引用结合在本公开中。
技术领域
本申请实施例涉及深度学习技术领域,例如涉及一种神经网络数据处理方法、装置、设备及存储介质。
背景技术
深度学习作为一种神经网络,动机在于建立和模拟人脑进行分析学习,由于能快速高效处理大量数据,深度学习的应用范围已经越来越广泛。
神经网络的运算通常需要大量的数据,在常见的深度学习神经网络推理芯片设计中,数据都存储在片外存储模块中,那么芯片在进行推理运算时,就需要将数据从片外存储模块搬运到片上存储模块中,计算引擎从片上存储模块读取数据进行深度学习运算。
然而在相关技术提供的数据处理方式中,数据搬运与数据运算是串行的,计算引擎通常要等待神经网络当前层所需的运算数据都从片外存储模块搬运到片上存储模块后才开始运算,神经网络下一层的数据搬运需要等到神经网络当前层的计算引擎运算完毕才开始,也即,当进行数据搬运时,计算引擎处于空闲状态,当计算引擎进行运算时,数据搬运处于空闲状态,这种数据处理方式的数据吞吐率较低,使得神经网络运算所需时间较长。
发明内容
本申请实施例提供一种神经网络数据处理方法、装置、设备及存储介质,以减少神经网络运算所需整体时长,提高神经网络的运算效率。
第一方面,本申请实施例提供一种神经网络数据处理方法,包括:
加载神经网络当前层的第二运算数据;
响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
第二方面,本申请实施例提供一种神经网络数据处理装置,包括:
第二运算数据加载模块,设置为加载神经网络当前层的第二运算数据;
第一运算数据加载模块,设置为响应于神经网络当前层的第二运算数据加 载完毕,加载神经网络下一层的第一运算数据。
第三方面,本申请实施例提供一种神经网络数据处理设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
所述一个或多个处理器设置为执行所述一个或多个程序,以实现本申请任意实施例提供的神经网络数据处理方法。
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现本申请任意实施例提供的神经网络数据处理方法。
附图说明
图1为本申请实施例一提供的一种神经网络数据处理方法的流程示意图;
图2A为本申请实施例二提供的一种神经网络数据处理方法的流程示意图;
图2B为本申请实施例二提供的神经网络数据处理方法的时序示意图;
图3为本申请实施例三提供的神经网络数据处理装置的结构示意图;
图4为本申请实施例四提供的一种神经网络数据处理设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。另外,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
在讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是多个步骤中的部分步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一运算数据称为第二运算数据,且类似地,可将第二运算数据称为第一运算数据。第一运算数据和第二 运算数据两者都是运算数据,但不是同一运算数据。术语“第一”、“第二”等不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”、“批量”的含义是至少两个,例如两个,三个等,除非另有明确限定。
实施例一
图1为本申请实施例一提供的一种神经网络数据处理方法的流程示意图,本实施例可适用于深度学习神经网络芯片推理过程中的数据处理。如图1所示,本申请实施例一提供的神经网络数据处理方法包括:
S110、加载神经网络当前层的第二运算数据。
例如,神经网络是一个模拟人脑的神经网络以期能够实现人工智能的算法模型,神经网络通常包括输入层、隐含层和输出层,每一个层都含有大量的运算节点,其中,隐含层的数量较多,在运算时,通常是神经网络的一层运算完成之后再进行神经网络下一层的运算。
神经网络的第二运算数据是指神经网络的运算图数据(feature map data),一般情况下,运算图数据通常根据用户输入神经网络的数据而生成,例如,用户输入一张图片,那么运算图数据是对该图片数据进行相关处理后得到的数据,如图片数据中提取的特征数据等。神经网络每一层的运算图数据都不相同,故神经网络每一层进行运算时,都要加载神经网络当前层的第二运算数据。
本实施例中,数据加载方式为DMA(Direct Memory Access,直接内存存取)传输方式,DMA包括RDMA(Read DMA,DMA读数据)和WDMA(Write DMA,DMA写数据)。加载神经网络当前层的第二运算数据,也就是通过DMA将片外存储模块中的第二运算数据读取到片内缓存模块中,其中,片外存储模块如片外DDR(Double Data Rate,双倍速率)存储器通常简称为片外DDR,片内缓存(Data Buffer)模块如片上RAM(Random Access Memory,随机存取存储器)。本实施例中,可通过RDMA实现加载神经网络当前层的第二运算数据。
S120、响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
例如,神经网络的第一运算数据是指神经网络的权值数据(coefficient data)和偏置数据(bias data),神经网络的每一层都有对应的第一运算数据。神经网络每一层的运算由神经网络计算引擎(Engine)完成,当加载神经网络当前层的 第二运算数据时,例如通过RDMA加载神经网络当前层的第二运算数据时,计算引擎会同步开始对神经网络当前层进行运算,但是神经网络当前层运算所需时间通常大于加载神经网络当前层的第二运算数据所需时间,因此,当神经网络当前层的第二运算数据加载完毕,此时计算引擎仍在运算,而此时立即加载神经网络下一层的第一运算数据,例如通过RDMA加载,使得数据加载不处于空闲状态,即,神经网络当前层的数据运算与神经网络下一层的第一运算数据的加载同步进行,也即,神经网络计算引擎的运行与DMA的数据读取并行处理,从而减少了计算引擎和数据读取的空闲时间,提高了神经网络运算过程中的数据吞吐率。
本申请实施例一提供的神经网络数据处理方法通过加载神经网络当前层的第二运算数据;响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据,使得神经网络计算引擎的运行与DMA的数据读取并行处理,提高了计算引擎与DMA的并行度,从而减少了计算引擎和数据读取的空闲时间,提高了神经网络运算过程中的数据吞吐率,减少了神经网络运算所需的总体耗时。
实施例二
图2A为本申请实施例二提供的一种神经网络数据处理方法的流程示意图。如图2A所示,本申请实施例二提供的神经网络数据处理方法包括:
S210、加载神经网络当前层的第二运算数据。
S220、响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
S230、根据预先加载的神经网络当前层的第一运算数据和所述神经网络当前层的第二运算数据进行运算,得到神经网络当前层的运算结果数据。
例如,神经网络每一层的运算是神经网络计算引擎(Engine)对神经网络每一层的第一运算数据和第二运算数据进行运算。神经网络在进行数据加载以及运算时,通常是先加载神经网络当前层的第一运算数据,当神经网络当前层的第一运算数据加载完毕之后,再加载神经网络当前层的第二运算数据,并且,在加载神经网络当前层的第二运算数据的过程中,计算引擎同步运行,根据已经加载的神经网络当前层的第一运算数据和第二运算数据进行运算,并输出神经网络当前层的运算结果数据,由计算引擎输出的运算结果数据缓存在片上RAM中。
本实施例中,若神经网络当前层为神经网络的第一层,那么预先加载的神经网络当前层的第一运算数据是在计算引擎处于空闲(idle)状态即计算引擎没有进行推理运算时,由RDMA读取到片上RAM中。若神经网络当前层不是神经网络的第一层,那么预先加载的神经网络当前层的第一运算数据是在神经网络上一层的第二运算数据加载完毕,且计算引擎处于运行状态即计算引擎正在对神经网络上一层的数据进行推理运算时,由RDMA读取到片上RAM中。
由于神经网络计算引擎运算所需时间通常大于加载神经网络当前层的第二运算数据所需时间,因此,当神经网络当前层的第二运算数据加载完毕后,加载神经网络下一层的第一运算数据,和,根据预先加载的神经网络当前层的第一运算数据和神经网络当前层的第二运算数据进行运算,同时进行,即步骤S220和S230同步进行,也即,RDMA与计算引擎同步运行。
S240、存储所述神经网络当前层的运算结果数据。
例如,当计算引擎将神经网络当前层的运算结果数据输出至片上RAM后,需要通过WDMA将片上RAM中的运算结果数据存储至片外DDR。
S250、响应于所述神经网络当前层的运算结果数据存储完毕,将神经网络下一层作为神经网络当前层。
步骤S250执行完毕后,返回步骤S210,直至完成神经网络所有层的第二运算数据的加载。
例如,当神经网络当前层的运算结果数据存储完毕,则说明神经网络当前层的数据运算完毕,需要进行神经网络下一层的运算,此时将神经网络下一层作为神经网络当前层,返回步骤S210,即加载神经网络下一层的第二运算数据,直至神经网络所有层的第二运算数据的加载完毕,即直至神经网络运算完毕。由于神经网络下一层的第一运算数据已经在计算引擎的运算过程中加载完毕,因此,当神经网络当前层的数据运算完毕时,计算引擎可快速根据加载后的神经网络下一层的第二运算数据和已经加载的神经网络下一层的第一运算数据进行运算,即计算引擎不需要等待神经网络下一层的第一运算数据的加载,从而减少了计算引擎处于空闲状态的时间。
示例性的,图2B为本申请实施例二提供的神经网络数据处理方法的时序示意图,如图2B所示,神经网络当前层为神经网络第一层。RDMA先加载神经网络当前层的第一运算数据,神经网络当前层的第一运算数据包括第一偏置数据21_1和第一权值数据22_1,第一偏置数据21_1可记为1-bias,第一权值数据22_1可记为1-coeff;然后,加载神经网络当前层的第二运算数据23_1,第二运 算数据23_1例如为第一运算图数据1-feature map data;计算引擎(Engine)根据加载的神经网络当前层的第一运算数据和第二运算数据进行运算操作,此运算操作24_1可记为第一运算1-compute,输出神经网络当前层的运算结果数据,同时WDMA将神经网络当前层的运算结果数据存储到片外DDR中,此数据存储操作25_1可记为第一数据存储1-output。
如图2B所示,当神经网络当前层的第二运算数据加载完毕后,RDMA立即加载神经网络下一层的第一运算数据,神经网络下一层的第一运算数据包括第二偏置数据21_2和第二权值数据22_2,第二偏置数据21_2可记为2-bias,第二权值数据22_2可记为2-coeff,由图2B可知,此时RDMA、计算引擎与WDMA都处于运行状态,也即三者并行运行。当WDMA将神经网络当前层的运算结果数据存储完毕,那么表示神经网络当前层的数据运算完毕,需要进行神经网络下一层的数据运算,而神经网络下一层的第一运算数据已经预先加载,此时可直接加载神经网络下一层的第二运算数据23_2,第二运算数据23_2例如为第二运算图数据2-feature map data,那么计算引擎可快速根据已加载的神经网络下一层的第一运算数据和第二运算数据进行运算操作,此运算操作24_2可记为第二运算2-comput,同时WDMA将神经网络下一层的运算结果数据存储到片外DDR中,此数据存储操作25_2可记为第二数据存储2-output。
如图2B所示,而当神经网络下一层的第二运算数据加载完毕,可立即加载神经网络第三层的第一运算数据,神经网络第三层的第一运算数据包括第三偏置数据21_3和第三权值数据22_3,第三偏置数据21_3可记为3-bias,第三权值数据22_3可记为3-coeff,后续的数据加载皆如此循环,直至神经网络所有层的数据都加载完毕。这样大大减少了计算引擎处于空闲状态的时间,从而减少了神经网络运算的总体耗时。
本申请实施例二提供的神经网络数据处理方法使得神经网络计算引擎的运行与DMA的数据读取并行处理,提高了计算引擎与DMA的并行度,从而减少了计算引擎和DMA的空闲时间,提高了神经网络运算过程中的数据吞吐率,减少了神经网络运算所需的总体耗时,提高了神经网络的运算效率。
实施例三
图3为本申请实施例三提供的神经网络数据处理装置的结构示意图,本申请实施例可适用于深度学习神经网络芯片推理过程中的数据处理。本申请实施例提供的神经网络数据处理装置可实现本申请任意实施例提供的神经网络数据 处理方法,具备实现方法的相应功能结构和有益效果,本实施例中未描述的内容可参考本申请任意方法实施例的描述。
如图3所示,本申请实施例三提供的神经网络数据处理装置包括:第二运算数据加载模块310和第一运算数据加载模块320,其中:
第二运算数据加载模块310,设置为加载神经网络当前层的第二运算数据;
第一运算数据加载模块320,设置为响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
在一实施例中,所述神经网络数据处理装置还包括:
数据运算模块,设置为根据预先加载的神经网络当前层的第一运算数据和所述神经网络当前层的第二运算数据进行运算,得到神经网络当前层的运算结果数据;
数据存储模块,设置为存储所述神经网络当前层的运算结果数据。
在一实施例中,还包括:
循环模块,设置为响应于所述神经网络当前层的运算结果数据存储完毕,将神经网络下一层作为神经网络当前层,返回加载神经网络当前层的第二运算数据的步骤,直至完成神经网络所有层的第二运算数据的加载。
在一实施例中,所述神经网络当前层的第二运算数据为神经网络当前层的运算图数据。
在一实施例中,所述神经网络当前层的第一运算数据为神经网络当前层的权值数据和偏置数据。
在一实施例中,数据加载方式为DMA传输方式。
本申请实施例三提供的神经网络数据处理装置通过第二运算数据加载模块和第一运算数据加载模块,使得神经网络计算引擎的运行与DMA的数据读取并行处理,提高了计算引擎与DMA的并行度,从而减少了计算引擎和数据读取的空闲时间,提高了神经网络运算过程中的数据吞吐率,减少了神经网络运算所需的总体耗时。
实施例四
图4为本申请实施例四提供的一种神经网络数据处理设备的结构示意图。图4示出了适于用来实现本申请实施方式的示例性神经网络数据处理设备412简称设备412的框图。图4显示的设备412仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图4所示,设备412以通用设备的形式表现。设备412的组件可以包括:一个或者多个处理器416,存储装置428,连接不同系统组件的总线418,系统组件例如包括存储装置428和处理器416。
总线418表示几类总线结构中的一种或多种,总线418包括存储装置总线或者存储装置控制器,外围总线,图形加速端口或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线等。
设备412典型地包括多种计算机系统可读介质。这些介质可以是任何能够被设备412访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储装置428可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)430和/或高速缓存432。设备412可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统434可以用于读写不可移动的、非易失性磁介质,不可移动的、非易失性磁介质在图4中未显示,通常称为硬盘驱动器。图4中未示出,可以提供用于对可移动非易失性磁盘例如软盘读写的磁盘驱动器,以及对可移动非易失性光盘,例如只读光盘(Compact Disc Read-Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线418相连。存储装置428可以包括至少一个程序产品,该程序产品具有一组例如至少一个程序模块,这些程序模块被配置以执行本申请实施例的功能。
具有一组例如至少一个程序模块442的程序/实用工具440,可以存储在例如存储装置428中,这样的程序模块442包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块442通常执行本申请所描述的实施例中的功能和/或方法。
设备412也可以与一个或多个外部设备414通信,外部设备414例如键盘、指向终端、显示器424等,设备412还可与一个或者多个使得用户能与该设备 412交互的终端通信,和/或与使得该设备412能与一个或多个其它计算终端进行通信的任何终端通信,任何终端例如网卡,调制解调器等等。这种通信可以通过输入/输出(I/O)接口422进行。并且,设备412还可以通过网络适配器420与一个或者多个网络通信,一个或者多个网络例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,公共网络例如因特网。如图4所示,网络适配器420通过总线418与设备412的其它模块通信。应当明白,尽管图中未示出,可以结合设备412使用其它硬件和/或软件模块,其它硬件和/或软件模块包括:微代码、终端驱动器、冗余处理器、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理器416通过运行存储在存储装置428中的程序,从而执行至少一种功能应用以及数据处理,例如实现本申请任意实施例所提供的神经网络数据处理方法,该方法可以包括:
加载神经网络当前层的第二运算数据;
响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
实施例五
本申请实施例五还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现如本申请任意实施例所提供的神经网络数据处理方法,该方法可以包括:
加载神经网络当前层的第二运算数据;
响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合,以上例子非穷举。在本申请文件中,计算机可读存储介质可以是任 何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,该计算机可读的信号介质中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或上述语言的组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言诸如C语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或终端上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机例如利用因特网服务提供商来通过因特网连接。
注意,上述仅为本申请的较佳实施例。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行多种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了说明,但是本申请不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例。

Claims (14)

  1. 一种神经网络数据处理方法,包括:
    加载神经网络当前层的第二运算数据;
    响应于所述神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
  2. 如权利要求1所述的方法,加载神经网络当前层的第二运算数据之后,还包括:
    根据预先加载的神经网络当前层的第一运算数据和所述神经网络当前层的第二运算数据进行运算,得到神经网络当前层的运算结果数据;
    存储所述神经网络当前层的运算结果数据。
  3. 如权利要求2所述的方法,存储所述神经网络当前层的运算结果数据之后,还包括:
    响应于所述神经网络当前层的运算结果数据存储完毕,将神经网络下一层作为神经网络当前层,返回所述加载神经网络当前层的第二运算数据的步骤,直至完成神经网络所有层的第二运算数据的加载。
  4. 如权利要求1-3任一项所述的方法,其中,所述神经网络当前层的第二运算数据为神经网络当前层的运算图数据。
  5. 如权利要求1-3任一项所述的方法,其中,所述神经网络当前层的第一运算数据为神经网络当前层的权值数据和偏置数据。
  6. 如权利要求1-3任一项所述的方法,其中,加载操作对应的数据加载方式为直接内存存取DMA传输方式。
  7. 一种神经网络数据处理装置,包括:
    第二运算数据加载模块,设置为加载神经网络当前层的第二运算数据;
    第一运算数据加载模块,设置为响应于神经网络当前层的第二运算数据加载完毕,加载神经网络下一层的第一运算数据。
  8. 如权利要求7所述的装置,其中,所述装置还包括:
    数据运算模块,设置为根据预先加载的神经网络当前层的第一运算数据和所述神经网络当前层的第二运算数据进行运算,得到神经网络当前层的运算结果数据;
    数据存储模块,设置为存储所述神经网络当前层的运算结果数据。
  9. 如权利要求8所述的装置,其中,所述装置还包括:
    循环模块,设置为响应于所述神经网络当前层的运算结果数据存储完毕,将神经网络下一层作为神经网络当前层,返回所述加载神经网络当前层的第二 运算数据的步骤,直至完成神经网络所有层的第二运算数据的加载。
  10. 如权利要求7-9任一项所述的装置,其中,所述神经网络当前层的第二运算数据为神经网络当前层的运算图数据。
  11. 如权利要求7-9任一项所述的装置,其中,所述神经网络当前层的第一运算数据为神经网络当前层的权值数据和偏置数据。
  12. 如权利要求7-9任一项所述的装置,其中,加载操作对应的数据加载方式为直接内存存取DMA传输方式。
  13. 一种神经网络数据处理设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    所述一个或多个处理器设置为执行所述一个或多个程序,以实现如权利要求1-6中任一项所述的神经网络数据处理方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-6中任一项所述的神经网络数据处理方法。
PCT/CN2021/106147 2020-07-15 2021-07-14 神经网络数据处理方法、装置、设备及存储介质 WO2022012563A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010679561.4A CN111813721B (zh) 2020-07-15 2020-07-15 神经网络数据处理方法、装置、设备及存储介质
CN202010679561.4 2020-07-15

Publications (1)

Publication Number Publication Date
WO2022012563A1 true WO2022012563A1 (zh) 2022-01-20

Family

ID=72866108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106147 WO2022012563A1 (zh) 2020-07-15 2021-07-14 神经网络数据处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111813721B (zh)
WO (1) WO2022012563A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813721B (zh) * 2020-07-15 2022-09-09 深圳鲲云信息科技有限公司 神经网络数据处理方法、装置、设备及存储介质
CN114118389B (zh) * 2022-01-28 2022-05-10 深圳鲲云信息科技有限公司 神经网络数据处理方法、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564638A (zh) * 2018-01-15 2019-04-02 深圳鲲云信息科技有限公司 人工智能处理器、及其所应用的处理方法
CN109844774A (zh) * 2018-08-28 2019-06-04 深圳鲲云信息科技有限公司 一种并行反卷积计算方法、单引擎计算方法及相关产品
CN110036404A (zh) * 2016-10-07 2019-07-19 世界线公司 用于检测数据流中的欺诈的系统
CN110659069A (zh) * 2018-06-28 2020-01-07 赛灵思公司 用于执行神经网络计算的指令调度方法及相应计算系统
CN110675309A (zh) * 2019-08-28 2020-01-10 江苏大学 一种基于卷积神经网络和VGGNet16模型的图像风格转换方法
CN111066058A (zh) * 2018-06-29 2020-04-24 百度时代网络技术(北京)有限公司 用于低功率实时对象检测的系统和方法
CN111813721A (zh) * 2020-07-15 2020-10-23 深圳鲲云信息科技有限公司 神经网络数据处理方法、装置、设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401532A (zh) * 2020-04-28 2020-07-10 南京宁麒智能计算芯片研究院有限公司 一种卷积神经网络推理加速器及加速方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110036404A (zh) * 2016-10-07 2019-07-19 世界线公司 用于检测数据流中的欺诈的系统
CN109564638A (zh) * 2018-01-15 2019-04-02 深圳鲲云信息科技有限公司 人工智能处理器、及其所应用的处理方法
CN110659069A (zh) * 2018-06-28 2020-01-07 赛灵思公司 用于执行神经网络计算的指令调度方法及相应计算系统
CN111066058A (zh) * 2018-06-29 2020-04-24 百度时代网络技术(北京)有限公司 用于低功率实时对象检测的系统和方法
CN109844774A (zh) * 2018-08-28 2019-06-04 深圳鲲云信息科技有限公司 一种并行反卷积计算方法、单引擎计算方法及相关产品
CN110675309A (zh) * 2019-08-28 2020-01-10 江苏大学 一种基于卷积神经网络和VGGNet16模型的图像风格转换方法
CN111813721A (zh) * 2020-07-15 2020-10-23 深圳鲲云信息科技有限公司 神经网络数据处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111813721A (zh) 2020-10-23
CN111813721B (zh) 2022-09-09

Similar Documents

Publication Publication Date Title
CN109388595B (zh) 高带宽存储器系统以及逻辑管芯
WO2022012563A1 (zh) 神经网络数据处理方法、装置、设备及存储介质
KR101558069B1 (ko) 범용 그래픽스 프로세싱 유닛에서의 컴퓨테이션 리소스 파이프라이닝
JP7096213B2 (ja) 人工知能チップに適用される算出方法および人工知能チップ
WO2021259041A1 (zh) Ai计算图的排序方法、装置、设备及存储介质
CN107315716B (zh) 一种用于执行向量外积运算的装置和方法
US11809953B1 (en) Dynamic code loading for multiple executions on a sequential processor
CN109284108B (zh) 无人车数据存储方法、装置、电子设备及存储介质
CN109711540B (zh) 一种计算装置及板卡
CN109740730B (zh) 运算方法、装置及相关产品
CN111552652A (zh) 基于人工智能芯片的数据处理方法、装置和存储介质
CN111061507A (zh) 运算方法、装置、计算机设备和存储介质
CN114036085B (zh) 基于ddr4的多任务读写调度方法、计算机设备及存储介质
US20140157237A1 (en) Overriding System Attributes and Function Returns in a Software Subsystem
CN109542604A (zh) 线程中注入接口的方法、装置、设备及存储介质
CN113835671A (zh) 音频数据快速播放方法、系统、计算机设备及存储介质
US20220318604A1 (en) Sparse machine learning acceleration
CN111913812B (zh) 一种数据处理方法、装置、设备及存储介质
CN107656702A (zh) 加速硬盘读写的方法及其系统、以及电子设备
US11797277B2 (en) Neural network model conversion method server, and storage medium
CN117940934A (zh) 数据处理装置及方法
US10223013B2 (en) Processing input/output operations in a channel using a control block
CN114518911B (zh) 一种插件加载时长预测方法、装置、设备和存储介质
CN111338694B (zh) 运算方法、装置、计算机设备和存储介质
CN111339060B (zh) 运算方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21842407

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21842407

Country of ref document: EP

Kind code of ref document: A1