WO2021259104A1 - 人工智能芯片和基于人工智能芯片的数据处理方法 - Google Patents

人工智能芯片和基于人工智能芯片的数据处理方法 Download PDF

Info

Publication number
WO2021259104A1
WO2021259104A1 PCT/CN2021/100362 CN2021100362W WO2021259104A1 WO 2021259104 A1 WO2021259104 A1 WO 2021259104A1 CN 2021100362 W CN2021100362 W CN 2021100362W WO 2021259104 A1 WO2021259104 A1 WO 2021259104A1
Authority
WO
WIPO (PCT)
Prior art keywords
data flow
artificial intelligence
data
calculation module
module
Prior art date
Application number
PCT/CN2021/100362
Other languages
English (en)
French (fr)
Inventor
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to US18/011,522 priority Critical patent/US20230281045A1/en
Publication of WO2021259104A1 publication Critical patent/WO2021259104A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the embodiments of the application relate to the field of artificial intelligence technology, for example, to an artificial intelligence chip and a data processing method based on the artificial intelligence chip.
  • the embodiment of the present application provides an artificial intelligence chip and a data processing method based on the artificial intelligence chip, so as to achieve the effect of improving the resource utilization rate of the artificial intelligence chip.
  • an artificial intelligence chip including:
  • each calculation module is used to process data based on one of the calculation nodes corresponding to the artificial intelligence algorithm, and the multiple calculation modules are sequentially connected according to the calculation sequence of the artificial intelligence algorithm;
  • the data flows in the data flow network composed of the multiple computing modules according to the preset data flow direction.
  • it further includes a data flow dam
  • the data flow dam is arranged between the previous calculation module and the next calculation module in the plurality of calculation modules, and is used to perform the calculation between the previous calculation module and the next calculation module.
  • the broadband of the module does not match, receive the first data output by the previous calculation module, and send the first data to the next calculation module according to the broadband matched by the next calculation module.
  • the previous calculation module and the next calculation module are adjacent or non-adjacent.
  • it also includes:
  • a partial data flow storage module which is connected to at least the first calculation module and the last calculation module of the plurality of calculation modules, and is used to send the data to all the calculation modules through the first calculation module. Processing in the data stream network, and/or receiving the processing result output by the last calculation module.
  • the data flow dam includes a write end, a read end, a full load end, and an unloaded end, and also includes:
  • the first AND gate is connected to the write terminal to indicate the uplink valid end, and the uplink valid end is used to receive the first valid signal sent by the previous calculation module;
  • the second AND gate is connected to the reading terminal to indicate a downlink permitting terminal, and the downlink permitting terminal is used to receive a second valid signal sent by the next calculation module;
  • a first NOT gate connected to the full load terminal to indicate an uplink permission terminal, and the first uplink permission terminal is used to send a first permission signal to the previous calculation module and the first AND gate;
  • the second NOT gate is connected to the no-load terminal to indicate a downstream valid terminal, and the downstream valid terminal is used to send a second valid signal to the next calculation module and the second AND gate.
  • the data flow network is a local data flow network, there are multiple local data flow networks, and multiple local data flow networks form a global data flow network, and the artificial intelligence chip further includes:
  • a global data flow storage module is connected to the multiple local data flow networks, and the global data flow storage module is used to transmit data to the local data flow network or to transfer data to the previous local data flow network The output second data is transmitted to the next local data flow network.
  • the multiple local data flow networks are respectively connected to one global data flow storage module.
  • an embodiment of the present application provides a data processing method based on an artificial intelligence chip, which is applied to the artificial intelligence chip described in any embodiment of the present application, and the method includes:
  • the data to be processed is processed based on an artificial intelligence chip that matches the data flow network and the data flow direction.
  • the matching the data flow network and the preset data flow direction corresponding to the target artificial intelligence model on the artificial intelligence chip includes:
  • the algorithm information includes calculation content, input and output information, and operation sequence, and the matching of the data flow network and data flow direction corresponding to the target artificial intelligence model on the artificial intelligence chip according to the algorithm information includes :
  • the data flow module at least including a calculation module
  • the data flow direction of the data to be processed in the data flow network is matched according to the operation sequence.
  • the artificial intelligence chip of the embodiment of the present application includes a plurality of calculation modules, and each calculation module is used to process data based on one of the calculation nodes corresponding to the artificial intelligence algorithm.
  • the multiple calculation modules are based on the artificial intelligence algorithm.
  • the order of operations is connected in sequence; wherein, the data flows in the data flow network composed of the multiple computing modules according to the preset data flow direction, which solves the problem of obtaining data through the instruction set method, which requires the consumption of artificial intelligence chip resources. This leads to the problem of low resource utilization of artificial intelligence chips, and achieves the effect of improving the resource utilization of artificial intelligence chips.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence chip provided in Embodiment 1 of the present application.
  • FIG. 2 is a schematic structural diagram of an artificial intelligence chip for computing CNN algorithm provided by Embodiment 1 of the present application;
  • FIG. 3 is a schematic diagram of a partial structure of an artificial intelligence chip provided in Embodiment 1 of the present application;
  • FIG. 4 is a schematic structural diagram of another artificial intelligence chip provided in Embodiment 1 of the present application.
  • FIG. 5 is a schematic structural diagram of another artificial intelligence chip provided in Embodiment 1 of the present application.
  • FIG. 6 is a schematic flowchart of a data processing method based on an artificial intelligence chip provided in the second embodiment of the present application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes the steps as sequential processing, many of the steps can be implemented in parallel, concurrently, or simultaneously. In addition, the order of the steps can be rearranged. The processing may be terminated when its operations are completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subcomputer programs, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the first data can be referred to as the second data
  • the second data can be referred to as the first data. Both the first data and the second data are data, but they are not the same data.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • "a plurality of” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence chip provided in Embodiment 1 of this application. As shown in FIG. 1, an embodiment of the present application provides an artificial intelligence chip 10, which includes a plurality of computing modules 110, wherein:
  • Each calculation module 110 is configured to process data based on one of the calculation nodes corresponding to the artificial intelligence algorithm, and the multiple calculation modules 110 are connected in sequence according to the calculation sequence of the artificial intelligence algorithm;
  • the data flows in the data flow network 100 composed of the multiple computing modules 110 according to the preset data flow direction.
  • each calculation module 110 in the artificial intelligence chip 10 processes data according to the corresponding operation node of the artificial intelligence algorithm, and is connected in sequence according to the operation sequence of the artificial intelligence algorithm.
  • the composition is suitable for performing data processing according to the artificial intelligence algorithm.
  • the computing module 110 includes, but is not limited to, computing functions such as convolution, pooling, activation, or full connection, and computing functions adapted to computing nodes of artificial intelligence algorithms can be configured as required.
  • the artificial intelligence algorithm in this embodiment includes but is not limited to CNN algorithm and RNN algorithm.
  • the CNN algorithm includes the calculation of the convolutional layer, the calculation of the pooling layer, and the calculation of the fully connected layer.
  • the fully connected layer performs calculations, and the computing node can be a node calculated in the convolutional layer, the pooling layer, or the fully connected layer.
  • the multiple computing modules 110 perform convolutional calculations, pooling calculations, and fully connected calculations respectively, and follow the CNN
  • the calculation sequence of the algorithm is connected end to end, and the data flows in the multiple calculation modules 110 according to the calculation sequence of the artificial intelligence algorithm, so that the artificial intelligence algorithm is used in the chip to process the data in the manner of data flow. It is understandable that the data is allowed to flow in the chip by means of data flow.
  • the computing module 110 does not need to perform the data acquisition action. It only needs to wait for the data to be processed when it arrives in accordance with the preset data flow direction, which reduces the number of instructions. The overhead improves the resource utilization of the chip.
  • FIG. 2 is a schematic structural diagram of an artificial intelligence chip for computing a CNN algorithm provided by an embodiment of the present application.
  • the artificial intelligence chip 10 includes a calculation module A111, a calculation module B112, and a calculation module C113.
  • the calculation module A111 is used for convolution calculation
  • the calculation module B112 is used for pooling calculation
  • the calculation module C113 is used for calculation.
  • the preset data flow directions are the calculation module A111, the calculation module B112, and the calculation module C113 in sequence. It is understandable that the image data flows in the calculation module A111, the calculation module B112, and the calculation module C113 according to the preset data flow.
  • the convolution calculation is performed, and after the calculation is completed, it reaches the calculation module B112. Perform pooling calculations, and finally reach the calculation module C113 for full connection calculations, and output the final calculation results.
  • the final calculation result can be stored in an off-chip storage outside the artificial intelligence chip 10, and there is no specific limitation here.
  • the previous calculation module 114 needs to send the first data to the previous calculation module 115 after completing the data processing, so that the previous calculation module 115 can use the first data to perform calculations.
  • the bandwidth of the previous calculation module 114 and the previous calculation module 115 do not match, for example, when the bandwidth of the previous calculation module 114 is greater than the bandwidth of the previous calculation module 115, the data received by the previous calculation module 115 will be quickly overflow.
  • FIG. 3 is a schematic diagram of a partial structure of an artificial intelligence chip provided by an embodiment of the present application.
  • the artificial intelligence chip 10 further includes a data flow dam 130, which is arranged between the previous calculation module 114 and the previous calculation module 115 of the plurality of calculation modules 110 for When the bandwidths of the previous calculation module 114 and the previous calculation module 115 do not match, receive the first data output by the previous calculation module 114, and match the first data according to the previous calculation module 115 The broadband of is sent to the next calculation module 112.
  • the previous calculation module 114 and the previous calculation module 115 in this embodiment only represent the calculation modules 110 that need to exchange data with each other, and are not limited to the specific calculation module 110, which can be determined according to different situations.
  • the previous calculation module 114 and the previous calculation module 115 are adjacent or non-adjacent, and there is no specific limitation here.
  • the bandwidth of the previous calculation module 114 and the previous calculation module 115 does not match through the data flow dam 130, the first data output by the previous calculation module 114 is received, and the first data is The broadband matched by the previous calculation module 115 is sent to the previous calculation module 115 to ensure the data balance of the data interaction between the previous calculation module 114 and the previous calculation module 115, so that data processing can be performed normally. Avoid data loss caused by clock cycle disorder.
  • Input data rate (F_in) effective input data number/unit time (T_d)
  • the data flow dam 130 should be able to store max(F_in)-min(F_out) data.
  • the data flow dam 130 combines the internal states of the previous calculation module 114 and the previous calculation module 115 together. It is purely the hardware that determines whether to stream data from the previous calculation module 114. Therefore, the data flow dam 130 can be understood as a barrier to regulate the data flow. Based on algorithm requirements, the data flow dam 130 is further extended to support predetermined static flow control.
  • the data stream dam 130 includes a write end, a read end, a full load end, and an unloaded end, and also includes:
  • the first AND gate is connected to the write terminal to indicate the uplink valid end, and the uplink valid end is used to receive the first valid signal sent by the previous calculation module 114;
  • the second AND gate is connected to the reading terminal to indicate a downlink permitting terminal, and the downlink permitting terminal is used to receive the second valid signal sent by the previous calculation module 115;
  • the first NOT gate is connected to the full load terminal to indicate an uplink permission terminal, and the first uplink permission terminal is used to send a first permission signal to the previous calculation module 114 and the first AND gate;
  • the second NOT gate is connected to the no-load terminal to indicate a downstream valid terminal, and the downstream valid terminal is used to send a second valid signal to the previous calculation module 115 and the second AND gate.
  • the last calculation module 114 is configured to receive the first permission signal sent by the data flow dam 130;
  • the previous calculation module 114 provides the first valid signal to the data flow dam 130 to write the first data in the to-be-processed data into the data flow dam 130, and the previous calculation module 114 Used to process the first data according to the processing mode pointed to by the computing node to obtain the calculation result, where the first data is data to be processed and calculated by the previous calculation module 114;
  • the data flow dam 130 is used to receive the second permission signal sent by the previous calculation module 115;
  • the data flow dam 130 provides the second valid signal to the previous calculation module 115 to write the calculation result into the previous calculation module 115.
  • the previous calculation module 114 receives the first permission signal sent by the data flow dam 130, which means that the data flow dam 130 is ready to receive the data that needs to be written in the previous calculation module 114. After 114 receives the first permission signal sent by the data stream dam 130, the previous calculation module 114 can read the calculation result. The previous calculation module 114 provides the first valid signal to the data flow dam 130, which means that the previous calculation module 114 can write the calculation result into the data flow dam 130, and the data flow dam 130 receives the first valid signal sent by the previous calculation module 114. After a valid signal, the data stream dam 130 can write the calculation result.
  • the previous calculation module 114 When the previous calculation module 114 receives the first permission signal sent by the data flow dam 130, and at the same time the data flow dam 130 also receives the first valid signal sent by the previous calculation module 114, the calculation result starts from the previous calculation module 114 Write the data flow dam 130.
  • the transmission of the communication is Will stop immediately.
  • the calculation result has been written into the data flow dam 130 from the previous calculation module 114, and the calculation result is stored in the data flow dam 130.
  • the data flow dam 130 When the data flow dam 130 receives the first permission signal sent by the previous calculation module 115, it means that the previous calculation module 115 is ready to receive the data that needs to be written in the data flow dam 130, and the data flow dam 130 receives the previous After the second permission signal sent by the calculation module 115, the data stream dam 130 can read the calculation result.
  • the data flow dam 130 provides the second valid signal to the previous calculation module 115, it means that the data flow dam 130 can write the calculation result into the previous calculation module 115, and the previous calculation module 115 receives the data sent by the data flow dam 130 After the second valid signal, the previous calculation module 115 can write the calculation result.
  • the calculation result starts from the data flow dam 130 Write to the previous calculation module 115.
  • the transmission of the communication is Will stop immediately.
  • the transmission of the calculation result from the previous calculation module 114 to the previous calculation module 115 is completed.
  • the calculation result does not refer to the sequential calculation result, and the calculation result can be any piece of data in actual communication.
  • FIG. 4 is a schematic structural diagram of another artificial intelligence chip provided by an embodiment of the present application.
  • the artificial intelligence chip further includes a local data stream storage module 120.
  • the partial data flow storage module 120 is connected to at least the first calculation module 110 and the last calculation module 110 in the data pipeline composed of the multiple calculation modules 110, and is used to pass the data through the first calculation module 110. Send to the data stream network for processing, and/or receive the processing result output by the last calculation module 110.
  • the preset data flow direction is controlled by the routing switch in the data flow network, and the local data flow storage module 120 can be programmed to store and output a predefined sequence of data, and this sequence data is sent through the first calculation module 110
  • the flow direction of the data is controlled by the routing switch in the data flow network.
  • the calculation result output by the last calculation module 110 is stored in the local data flow storage module 120.
  • FIG. 5 is a schematic structural diagram of another artificial intelligence chip provided by an embodiment of the present application.
  • the data flow network 100 is a local data flow network 100, and there are multiple local data flow networks 100.
  • the multiple local data flow networks 100 form a global data flow network.
  • the artificial intelligence chip 10 further includes:
  • a global data flow storage module 200 is connected to the multiple local data flow networks 100, and the global data flow storage module 200 is used to transmit data to or upload data to the local data flow network 100 The second data output by one partial data flow network 100 is transmitted to the next partial data flow network 100.
  • each data flow network 100 executes data processing corresponding to an artificial intelligence algorithm.
  • the global data stream storage module 200 can be used as a container for providing data to each local data stream network 100, and can also transmit the second data output by the previous local data stream network 100 to the next local data stream network 100.
  • the global data flow storage module 200 there is one global data flow storage module 200, and the multiple local data flow networks 100 are respectively connected to one global data flow storage module 200.
  • the global data flow module 200 can also serve as a connection window between the artificial intelligence chip 10 and the outside of the chip.
  • the artificial intelligence chip includes a plurality of calculation modules, and each calculation module is used to process data based on one of the calculation nodes corresponding to the artificial intelligence algorithm.
  • the operation sequence of the intelligent algorithm is connected sequentially. Since the data automatically flows in the data flow network according to the data flow direction, the instruction overhead is avoided, and the technical effect of improving the resource utilization rate of the artificial intelligence chip is achieved.
  • by setting up a local data stream storage module between two calculation modules that need to perform data interaction and the broadband does not match data can be accurately transmitted for calculation even if the broadband does not match.
  • Figure 6 is a schematic flow chart of a data processing method based on an artificial intelligence chip provided in the second embodiment of the application, which can be applied to scenarios where data processing is performed on an artificial intelligence chip. Chip to execute.
  • the data processing method based on the artificial intelligence chip provided in the second embodiment of the present application includes:
  • the data to be processed may be image data, voice data, text data, etc., which are not specifically limited here.
  • the target artificial intelligence model refers to an artificial intelligence learning module for processing based on the data to be processed.
  • the target artificial intelligence module corresponds to the data type corresponding to the data to be processed.
  • the target artificial intelligence model is a CNN model
  • the target artificial intelligence model is an RNN model.
  • the data flow network refers to the composition of each module adapted to the algorithm corresponding to the target artificial intelligence model and used to realize the complete calculation of the target artificial intelligence model.
  • the preset data flow direction refers to the flow direction of the data to be processed in the data flow network.
  • a data flow refers to an ordered sequence of points that can be read once or a few times, according to a predefined data flow direction, and the data flow flows in the data flow network according to the data flow direction, so that it is read by the computing module To process.
  • the artificial intelligence chip in this embodiment includes, but is not limited to, an FPGA chip and a CAISA chip.
  • S530 Process the data to be processed based on the artificial intelligence chip that matches the data flow network and the data flow direction.
  • the artificial intelligence chip can process the data to be processed based on the data flow network and the preset data flow direction. Specifically, the data to be processed flows in the data flow network according to the data flow direction.
  • the data flow network includes multiple calculation modules for calculating according to the algorithm corresponding to the target artificial intelligence model. When the data reaches the calculation module, the calculation module uses Data is calculated.
  • matching the artificial intelligence chip with the data flow network corresponding to the target artificial intelligence model and the preset data flow direction may include:
  • the algorithm information refers to the information related to the algorithm corresponding to the target artificial intelligence model.
  • the algorithm information includes calculation content, input and output information, and operation sequence.
  • matching the data flow network and data flow direction corresponding to the target artificial intelligence model on the artificial intelligence chip includes:
  • the data flow module is matched according to the calculation content, the data flow module includes at least a calculation module; the connection relationship of the data flow module is matched according to the input and output information to form the data flow network; the data flow network is matched according to the operation sequence The data flow direction of the processed data in the data flow network.
  • the calculation content refers to the calculations involved in processing according to artificial intelligence algorithms, such as convolution calculations, pooling calculations, and so on.
  • the data stream module includes at least a calculation module, and when the broadband does not match between the calculation modules for data interaction, it also includes a local data stream storage module.
  • the input and output information refers to the information of the input data and output data of each calculation module, and the connection relationship between the data flow modules can be matched according to the input and output information. Then according to the operation sequence of the artificial intelligence algorithm, the data flow direction of the data in the data flow network can be determined.
  • the data flow network and the data flow direction can be automatically mapped in the artificial intelligence chip according to the artificial intelligence algorithm, and users can easily and understandably use the artificial intelligence chip of the embodiment of the application to perform corresponding processing, which is very powerful Ease of use.
  • the artificial intelligence chip predefines the corresponding calculation functions of multiple computing modules. Multiple computing modules are mixed to form different data stream networks to execute different artificial intelligence algorithms. It can be set as needed to support multiple artificial intelligence.
  • the algorithm realizes the versatility of the data flow artificial intelligence chip.
  • the technical solution of the embodiment of the present application determines the target artificial intelligence model used to process the data to be processed when the data to be processed is started to be processed; the artificial intelligence chip is matched with the target artificial intelligence The data flow network corresponding to the model and the preset data flow direction; the data to be processed is processed based on the artificial intelligence chip that matches the data flow network and the data flow direction, so as to achieve the technical effect of improving the efficiency of retrieving files.
  • the artificial intelligence chip predefines the corresponding calculation functions of multiple computing modules. Multiple computing modules are mixed to form different data stream networks to execute different artificial intelligence algorithms. It can be set as needed to support multiple artificial intelligence. The algorithm realizes the versatility of the data flow artificial intelligence chip.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种人工智能芯片和基于人工智能芯片的数据处理方法。该人工智能芯片包括:多个计算模块,每个计算模块用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块之间按照所述人工智能算法的运算顺序依次连接;其中,数据按照预设的数据流向在所述多个计算模块组成的数据流网络中流动。

Description

人工智能芯片和基于人工智能芯片的数据处理方法
本申请要求在2020年06月22日提交中国专利局、申请号为202010576743.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,例如涉及一种人工智能芯片和基于人工智能芯片的数据处理方法。
背景技术
随着人工智能的迅速发展,市场上出现了许多用于计算人工智能学习模型的人工智能芯片。
目前,常用的人工智能芯片是通过指令集的方式获取数据,并按照人工智能算法的运算规则对数据进行处理。
然而,通过指令集的方式获取数据,需要消耗人工智能芯片的资源来实现,导致人工智能芯片的资源利用率较低。
发明内容
本申请实施例提供一种人工智能芯片和基于人工智能芯片的数据处理方法,以实现提高人工智能芯片的资源利用率的效果。
第一方面,本申请实施例提供了一种人工智能芯片,包括:
多个计算模块,每个计算模块用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块之间按照所述人工智能算法的运算顺序依次连接;
其中,数据按照预设的数据流向在所述多个计算模块组成的数据流网络中流动。
可选的,还包括数据流坝,所述数据流坝设置在所述多个计算模块中的上一计算模块和下一计算模块之间,用于在所述上一计算模块和下一计算模块的宽带不匹配时,接收所述上一计算模块输出的第一数据,并将所述第一数据按照所述下一计算模块匹配的宽带发送至所述下一计算模块。
可选的,所述上一计算模块和所述下一计算模块相邻或非相邻。
可选的,还包括:
局部数据流存储模块,所述局部数据流存储模块至少与所述多个计算模块中的第一个计算模块和最后一个计算模块连接,用于将所述数据通过第一个计算模块发送至所述数据流网络中处理,和/或接收所述最后一个计算模块输出的处理结果。
可选的,所述数据流坝包括写入端、读取端、满载端和空载端,还包括:
第一与门,与所述写入端连接以表示上行有效端,所述上行有效端用于接收所述上一计算模块发送的第一有效信号;
第二与门,与所述读取端连接以表示下行许可端,所述下行许可端用于接收所述下一计算模块发送的第二有效信号;
第一非门,与所述满载端连接以表示上行许可端,所述第上行许可端用于发送第一许可信号给所述上一计算模块和第一与门;
第二非门,与所述空载端连接以表示下行有效端,所述下行有效端用于发送第二有效信号给所述下一计算模块和第二与门。
可选的,所述数据流网络为局部数据流网络,所述局部数据流网络为多个,多个所述局部数据流网络组成一个全局数据流网络,所述人工智能芯片还包括:
全局数据流存储模块,所述全局数据流存储模块和所述多个局部数据流网络连接,所述全局数据流存储模块用于给所述局部数据流网络传输数据或将上一局部数据流网络输出的第二数据传输至下一局部数据流网络。
可选的,所述全局数据流存储模块为一个,所述多个局部数据流网络分别和一个所述全局数据流存储模块连接。
第二方面,本申请实施例提供了一种基于人工智能芯片的数据处理方法,应用于本申请任一实施例所述的人工智能芯片,所述方法包括:
在开始对待处理的数据进行处理时,确定用于对所述待处理的数据进行处理的目标人工智能模型;
在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向;
基于匹配了所述数据流网络和所述数据流向的人工智能芯片对所述待处理的数据进行处理。
可选的,所述在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向,包括:
确定所述目标人工智能模型对应的算法信息;
根据所述算法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向。
可选的,所述算法信息包括计算内容、输入输出信息和运算顺序,所述根据所述算法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向,包括:
根据所述计算内容匹配数据流模块,所述数据流模块至少包括计算模块;
根据所述输入输出信息匹配数据流模块的连接关系,组成所述数据流网络;
根据所述运算顺序匹配所述待处理的数据在所述数据流网络的数据流向。
本申请实施例的人工智能芯片,包括多个计算模块,每个计算模块用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块之间按照所述人工智能算法的运算顺序依次连接;其中,数据按照预设的数据流向在所述多个计算模块组成的数据流网络中流动,解决了通过指令集的方式获取数据,需要消耗人工智能芯片的资源来实现,导致人工智能芯片的资源利用率较低的问题,实现了提高人工智能芯片的资源利用率的效果。
附图说明
图1是本申请实施例一提供的一种人工智能芯片的结构示意图;
图2是本申请实施例一提供的一种用于运算CNN算法的人工智能芯片的结构示意图;
图3是本申请实施例一提供的一种人工智能芯片的局部结构示意图;
图4是本申请实施例一提供的另一种人工智能芯片的结构示意图;
图5是本申请实施例一提供的另一种人工智能芯片的结构示意图;
图6是本申请实施例二提供的一种基于人工智能芯片的数据处理方法的流程示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结 构。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子计算机程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述各种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一数据为第二数据,且类似地,可将第二数据称为第一数据。第一数据和第二数据两者都是数据,但其不是同一数据。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
实施例一
图1为本申请实施例一提供的一种人工智能芯片的结构示意图。如图1所示,本申请实施例提供了一种人工智能芯片10,包括多个计算模块110,其中:
每个计算模块110用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块110之间按照所述人工智能算法的运算顺序依次连接;
其中,数据按照预设的数据流向在所述多个计算模块110组成的数据流网络100中流动。
在本实施例中,人工智能芯片10内的每个计算模块110按照人工智能算法对应运算节点对数据进行处理,并按照人工智能算法的运算顺序依次连接,组成适用于按照人工智能算法对数据进行计算的数据流网络100。具体的,计算模块110包括但不限于运算卷积、池化、激活或全连接等计算功能,可以根据需要配置与人工智能算法的运算节点相适配的计算功能。本实施例的人工智能算法包括但不限于CNN算法和RNN算法等。
以人工智能算法为CNN算法为例,CNN算法包括卷积层的计算、池化层的计算和全连接层的计算,且运算顺序为在卷积层计算,然后在池化层计算,最后在全连接层进行计算,且运算节点可以是在卷积层、池化层或全连接层计 算的节点,则多个计算模块110分别执行卷积计算、池化计算和全连接计算,并按照CNN算法的运算顺序首尾连接,则数据按照人工智能算法的运算顺序在多个计算模块110进行流动,从而实现在芯片内利用人工智能算法按照数据流的方式对数据进行处理。可以理解的是,通过数据流的方式让数据在芯片内自行流动,计算模块110不需要进行数据获取的动作,只需要等待数据按照预设的数据流向到达时进行处理即可,减少了指令的开销,提高了芯片的资源利用率。
参考图2,图2是本申请实施例提供的一种用于运算CNN算法的人工智能芯片的结构示意图。通过图2可知,人工智能芯片10包括计算模块A111、计算模块B112和计算模块C113,其中,计算模块A111用于进行卷积计算,计算模块B112用于进行池化计算,计算模块C113用于进行全连接计算。则预设的数据流向依次为计算模块A111、计算模块B112和计算模块C113。可以理解的是,图像数据按照预设的数据流向在计算模块A111、计算模块B112和计算模块C113中流动,图像数据到达计算模块A111时,进行卷积的计算,计算完成后到达计算模块B112,进行池化的计算,最后到达计算模块C113进行全连接的计算,输出最后的计算结果。其中,最后的计算结果可以保存在人工智能芯片10外的片外存储中,此处不作具体限制。
具体的,在一些场景中,上一计算模块114对数据处理完成得到第一数据后需要向发送给上一计算模块115,以供上一计算模块115利用第一数据进行计算。然而,当上一计算模块114和上一计算模块115的宽带不匹配时,例如上一计算模块114的宽带大于上一计算模块115的宽带时,则上一计算模块115接收的数据将会迅速溢出。
参考图3,图3是本申请实施例提供的一种人工智能芯片的局部结构示意图。在本实施例中,人工智能芯片10还包括数据流坝130,所述数据流坝130设置在所述多个计算模块110中的上一计算模块114和上一计算模块115之间,用于在所述上一计算模块114和上一计算模块115的宽带不匹配时,接收所述上一计算模块114输出的第一数据,并将所述第一数据按照所述上一计算模块115匹配的宽带发送至所述下一计算模块112。
具体的,本实施例的上一计算模块114和上一计算模块115仅表示相互之间需要进行数据交互的计算模块110,而并不限于具体的计算模块110,可以根据不同的情况确定。可选的,上一计算模块114和上一计算模块115是相邻或非相邻的,此处不作具体限制。
可以理解的是,通过数据流坝130在上一计算模块114和上一计算模块115的宽带不匹配时,接收所述上一计算模块114输出的第一数据,并将所述第一 数据按照所述上一计算模块115匹配的宽带发送至所述上一计算模块115,保证了上一计算模块114和上一计算模块115之间进行数据交互的数据平衡,从而使得数据处理能正常进行,避免时钟周期错乱导致数据丢失。
具体的,为了实现通过数据流坝130作为上一计算模块114和上一计算模块115之间的自动流控制,其基本思想如下:
A)输入数据速率(F_in)=有效输入数据数/单位时间(T_d)
B)输出数据速率(F_out)=有效输出数据数/单位时间(T_d)
C)在整个运行期间,如果F_in==F_out,则
为了完全避免背压:数据流坝130应该能够存储max(F_in)–min(F_out)数据。数据流坝130将上一计算模块114和上一计算模块115的内部状态结合在一起。纯粹由硬件决定是否将数据从上一计算模块114中流出来。因此,该数据流坝130可以理解为调节数据流的屏障。基于算法要求,数据流坝130进一步扩展为支持预定静态流量控制。
可选的,数据流坝130包括写入端、读取端、满载端和空载端,还包括:
第一与门,与所述写入端连接以表示上行有效端,所述上行有效端用于接收所述上一计算模块114发送的第一有效信号;
第二与门,与所述读取端连接以表示下行许可端,所述下行许可端用于接收所述上一计算模块115发送的第二有效信号;
第一非门,与所述满载端连接以表示上行许可端,所述第上行许可端用于发送第一许可信号给所述上一计算模块114和第一与门;
第二非门,与所述空载端连接以表示下行有效端,所述下行有效端用于发送第二有效信号给所述上一计算模块115和第二与门。
具体的,所述上一计算模块114用于接收所述数据流坝130发送的第一许可信号;
所述上一计算模块114提供所述第一有效信号给所述数据流坝130,以将所述待处理数据中的第一数据写入所述数据流坝130,所述上一计算模块114用于按照所述运算节点指向的处理方式对所述第一数据进行处理,得到所述计算结果,其中第一数据为待处理数据中,适用上一计算模块114进行计算的数据;
所述数据流坝130用于接收所述上一计算模块115发送的第二许可信号;
所述数据流坝130提供所述第二有效信号给所述上一计算模块115,以将所述计算结果写入所述上一计算模块115。
本实施例中,上一计算模块114接收到数据流坝130发送的第一许可信号,即表示数据流坝130已准备好接收上一计算模块114中需要写入的数据,在上一计算模块114接收到数据流坝130发送的第一许可信号后,上一计算模块114可以读取计算结果。上一计算模块114给数据流坝130提供第一有效信号,即表示上一计算模块114可以将计算结果写入数据流坝130中,在数据流坝130接收到上一计算模块114发送的第一有效信号后,数据流坝130可以写入计算结果。
上一计算模块114接收到数据流坝130发送的第一许可信号,同时数据流坝130也接收到上一计算模块114发送的第一有效信号时,计算结果便开始从上一计算模块114中写入数据流坝130。其中,当任一信号停止发送时,即数据流坝130停止给上一计算模块114发送第一许可信号或上一计算模块114停止给数据流坝130发送第一有效信号时,该通信的传输将会立即停止。此时,计算结果已经从上一计算模块114中写入数据流坝130内,数据流坝130中存储有计算结果。当数据流坝130接收到上一计算模块115发送的第一许可信号,即表示上一计算模块115已准备好接收数据流坝130中需要写入的数据,在数据流坝130接收到上一计算模块115发送的第二许可信号后,数据流坝130可以读取计算结果。当数据流坝130提供第二有效信号给上一计算模块115,即表示数据流坝130可以将计算结果写入上一计算模块115中,在上一计算模块115接收到数据流坝130发送的第二有效信号后,上一计算模块115可以写入计算结果。
当数据流坝130接收到上一计算模块115发送的第一许可信号,同时上一计算模块115也接收到数据流坝130发送的第二有效信号时,计算结果便开始从数据流坝130中写入上一计算模块115。其中,当任一信号停止发送时,即上一计算模块115停止给数据流坝130发送第二许可信号或数据流坝130停止给上一计算模块115发送第二有效信号时,该通信的传输将会立即停止。由此完成计算结果从上一计算模块114到上一计算模块115的传输。另外需要说明的是,计算结果并非指按顺序的计算结果,该计算结果可以为实际通信中的任意一段数据。
参考图4,图4是本申请实施例提供的另一种人工智能芯片的结构示意图。在一个实施例中,人工智能芯片还包括局部数据流存储模块120。所述局部数据流存储模块120至少与所述多个计算模块110组成的数据流水线中的第一个计算模块110和最后一个计算模块110连接,用于将所述数据通过第一个计算模块110发送至所述数据流网络中处理,和/或接收所述最后一个计算模块110输出的处理结果。
具体的,预设的数据流向通过数据流网络内的路由交换机来控制,局部数据流存储模块120可编程的存储和输出预先定义好序列的数据,这个序列数据通过第一个计算模块110送入数据流网络100中的数据流水线的每个计算模块110中,通过数据流网络中的路由交换机来控制数据的流向。在数据流网络100中计算完成时,通过最后一个计算模块110输出计算结果存储至局部数据流存储模块120中。
参考图5,图5是本申请实施例提供的另一种人工智能芯片的结构示意图。在一个实施例中,数据流网络100为局部数据流网络100,局部数据流网络100为多个,多个所述局部数据流网络100组成一个全局数据流网络,人工智能芯片10还包括:
全局数据流存储模块200,所述全局数据流存储模块200和所述多个局部数据流网络100连接,所述全局数据流存储模块200用于给所述局部数据流网络100传输数据或将上一局部数据流网络100输出的第二数据传输至下一局部数据流网络100。
在本实施例中,具体的,每个数据流网络100执行一个人工智能算法对应的数据处理。
需要说明的是,在人工智能芯片10进行数据处理的过程中,可能是多个人工智能算法并行计算,在一个局部数据流网络100中进行一个独立的计算。则全局数据流存储模块200可以作为给每个局部数据流网络100提供数据的容器,也可以将上一局部数据流网络100输出的第二数据传输至下一局部数据流网络100。在本实施例中,每个局部数据流网络100内部的情况可以参照任一实施例的描述,本实施例不作过多赘述。可选的,全局数据流存储模块200为一个,所述多个局部数据流网络100分别和一个所述全局数据流存储模块200连接。可选的,全局数据流模块200还可以作为人工智能芯片10与片外的连接窗口。
本申请实施例的技术方案,人工智能芯片包括多个计算模块,每个计算模块用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块之间按照所述人工智能算法的运算顺序依次连接,由于数据按照数据流向在数据流网络中自动流动,避免了指令的开销,达到提高人工智能芯片的资源利用率技术效果。此外,通过在需要进行数据交互且宽带不匹配的两个计算模块之间设置局部数据流存储模块,即使宽带不匹配也能精准传输数据进行计算。
实施例二
图6为本申请实施例二提供的一种基于人工智能芯片的数据处理方法的流 程示意图,可适用于在人工智能芯片进行数据处理的场景,该方法可以由本申请任一实施例提供的人工智能芯片来执行。
如图6所示,本申请实施例二提供的基于人工智能芯片的数据处理方法包括:
S510、在开始对待处理的数据进行处理时,确定用于对所述待处理的数据进行处理的目标人工智能模型;
其中,待处理的数据可以是图像数据、语音数据和文本数据等,此处不作具体限制。目标人工智能模型是指用于基于待处理的数据进行处理的人工智能学习模块。具体的,目标人工智能模块与待处理的数据对应的数据类型对应。例如,当待处理的数据为图像数据时,目标人工智能模型为CNN模型;当待处理的数据为文本数据时,目标人工智能模型为RNN模型。
S520、在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向;
其中,数据流网络是指与目标人工智能模型对应的算法相适配的,用于实现目标人工智能模型完整计算的各个模块的组成。预设的数据流向是指待处理的数据在数据流网络中的流向。具体的,数据流是指能被读取一次或少数几次的点的有序序列,按照预先定义的数据流向,数据流在数据流网络中按照数据流向进行流动,从而被计算模块读取后进行处理。本实施例的人工智能芯片包括但不限于FPGA芯片和CAISA芯片等。
S530、基于匹配了所述数据流网络和所述数据流向的人工智能芯片对所述待处理的数据进行处理。
在本步骤中,在匹配出目标人工智能模型对应的数据流网络和数据流向时,人工智能芯片可以基于数据流网络和预设的数据流向对待处理的数据进行处理。具体的,待处理的数据在数据流网络中按照数据流向进行流动,数据流网络包括用于按照目标人工智能模型对应的算法进行计算的多个计算模块,当数据到达计算模块时,计算模块利用数据进行计算。
在一个可选的实施方式中,在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向,可以包括:
确定所述目标人工智能模型对应的算法信息;根据所述算法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向。
其中,算法信息是指与目标人工智能模型对应的算法相关的信息。
可选的,算法信息包括计算内容、输入输出信息和运算顺序,根据所述算 法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向,包括:
根据所述计算内容匹配数据流模块,所述数据流模块至少包括计算模块;根据所述输入输出信息匹配数据流模块的连接关系,组成所述数据流网络;根据所述运算顺序匹配所述待处理的数据在所述数据流网络的数据流向。
其中,计算内容是指按照人工智能算法进行处理涉及到的计算,例如卷积计算、池化计算等。数据流模块至少包括计算模块,当进行数据交互的计算模块之间宽带不匹配时,还包括局部数据流存储模块。输入输出信息是指每个计算模块的输入数据和输出数据的信息,根据输入输出信息可以匹配数据流模块之间的连接关系。再根据人工智能算法的运算顺序可以确定出数据在数据流网络的数据流向。
需要说明的是,数据流网络和所述数据流向可以根据人工智能算法在人工智能芯片内自动映射,使用人员都可以简单易懂的使用本申请实施例的人工智能芯片执行相应处理,具有非常强的易用性。
可以理解的是,通过基于匹配了所述数据流网络和所述数据流向的人工智能芯片对所述待处理的数据进行处理,减少了指令的开销,提高了人工智能芯片的资源利用率。此外,人工智能芯片预先定义了多个计算模块各自对应的计算功能,多个计算模块之间进行混合组成不同的数据流网络以执行不同的人工智能算法,可以根据需要设置以支持多种人工智能算法,实现了数据流人工智能芯片的通用性。
本申请实施例的技术方案,通过在开始对待处理的数据进行处理时,确定用于对所述待处理的数据进行处理的目标人工智能模型;在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向;基于匹配了所述数据流网络和所述数据流向的人工智能芯片对所述待处理的数据进行处理,达到提高检索文件的效率的技术效果。此外,人工智能芯片预先定义了多个计算模块各自对应的计算功能,多个计算模块之间进行混合组成不同的数据流网络以执行不同的人工智能算法,可以根据需要设置以支持多种人工智能算法,实现了数据流人工智能芯片的通用性。

Claims (10)

  1. 一种人工智能芯片,包括:
    多个计算模块,每个计算模块用于基于人工智能算法对应的其中一个运算节点对数据进行处理,所述多个计算模块之间按照所述人工智能算法的运算顺序依次连接;
    其中,数据按照预设的数据流向在所述多个计算模块组成的数据流网络中流动。
  2. 如权利要求1所述的人工智能芯片,还包括数据流坝,所述数据流坝设置在所述多个计算模块中的上一计算模块和下一计算模块之间,用于在所述上一计算模块和下一计算模块的宽带不匹配时,接收所述上一计算模块输出的第一数据,并将所述第一数据按照所述下一计算模块匹配的宽带发送至所述下一计算模块。
  3. 如权利要求2所述的人工智能芯片,其中,所述上一计算模块和所述下一计算模块相邻或非相邻。
  4. 如权利要求1所述的人工智能芯片,还包括:
    局部数据流存储模块,所述局部数据流存储模块至少与所述多个计算模块中的第一个计算模块和最后一个计算模块连接,用于将所述数据通过第一个计算模块发送至所述数据流网络中处理,和/或接收所述最后一个计算模块输出的处理结果。
  5. 如权利要求2所述的人工智能芯片,其中,所述数据流坝包括写入端、读取端、满载端和空载端,还包括:
    第一与门,与所述写入端连接以表示上行有效端,所述上行有效端用于接收所述上一计算模块发送的第一有效信号;
    第二与门,与所述读取端连接以表示下行许可端,所述下行许可端用于接收所述下一计算模块发送的第二有效信号;
    第一非门,与所述满载端连接以表示上行许可端,所述第上行许可端用于发送第一许可信号给所述上一计算模块和第一与门;
    第二非门,与所述空载端连接以表示下行有效端,所述下行有效端用于发送第二有效信号给所述下一计算模块和第二与门。
  6. 如权利要求1所述的人工智能芯片,其中,所述数据流网络为局部数据流网络,所述局部数据流网络为多个,多个所述局部数据流网络组成一个全局数据流网络,所述人工智能芯片还包括:
    全局数据流存储模块,所述全局数据流存储模块和所述多个局部数据流网 络连接,所述全局数据流存储模块用于给所述局部数据流网络传输数据或将上一局部数据流网络输出的第二数据传输至下一局部数据流网络。
  7. 如权利要求6所述的人工智能芯片,其中,所述全局数据流存储模块为一个,所述多个局部数据流网络分别和一个所述全局数据流存储模块连接。
  8. 一种基于人工智能芯片的数据处理方法,应用于如权利要求1-7任一项所述的人工智能芯片,所述方法包括:
    在开始对待处理的数据进行处理时,确定用于对所述待处理的数据进行处理的目标人工智能模型;
    在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向;
    基于匹配了所述数据流网络和所述数据流向的人工智能芯片对所述待处理的数据进行处理。
  9. 如权利要求8所述的方法,其中,所述在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和预设的数据流向,包括:
    确定所述目标人工智能模型对应的算法信息;
    根据所述算法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向。
  10. 如权利要求9所述的方法,其中,所述算法信息包括计算内容、输入输出信息和运算顺序,所述根据所述算法信息在所述人工智能芯片匹配与所述目标人工智能模型对应的数据流网络和数据流向,包括:
    根据所述计算内容匹配数据流模块,所述数据流模块至少包括计算模块;
    根据所述输入输出信息匹配数据流模块的连接关系,组成所述数据流网络;
    根据所述运算顺序匹配所述待处理的数据在所述数据流网络的数据流向。
PCT/CN2021/100362 2020-06-22 2021-06-16 人工智能芯片和基于人工智能芯片的数据处理方法 WO2021259104A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/011,522 US20230281045A1 (en) 2020-06-22 2021-06-16 Artificial intelligence chip and data processing method based on artificial intelligence chip

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010576743.9A CN111752887B (zh) 2020-06-22 2020-06-22 人工智能芯片和基于人工智能芯片的数据处理方法
CN202010576743.9 2020-06-22

Publications (1)

Publication Number Publication Date
WO2021259104A1 true WO2021259104A1 (zh) 2021-12-30

Family

ID=72676415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/100362 WO2021259104A1 (zh) 2020-06-22 2021-06-16 人工智能芯片和基于人工智能芯片的数据处理方法

Country Status (3)

Country Link
US (1) US20230281045A1 (zh)
CN (1) CN111752887B (zh)
WO (1) WO2021259104A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429051A (zh) * 2022-04-01 2022-05-03 深圳鲲云信息科技有限公司 数据流芯片的建模方法、装置、设备及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857989B (zh) * 2020-06-22 2024-02-27 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法
CN111752887B (zh) * 2020-06-22 2024-03-15 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564638A (zh) * 2018-01-15 2019-04-02 深圳鲲云信息科技有限公司 人工智能处理器、及其所应用的处理方法
CN110147251A (zh) * 2019-01-28 2019-08-20 腾讯科技(深圳)有限公司 用于计算神经网络模型的架构、芯片及计算方法
US20200050476A1 (en) * 2018-08-10 2020-02-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Chip And Instruction Execution Method For Artificial Intelligence Chip
CN110991634A (zh) * 2019-12-04 2020-04-10 腾讯科技(深圳)有限公司 人工智能加速器、设备、芯片及数据处理方法
CN111752887A (zh) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046704B (zh) * 2019-04-09 2022-11-08 深圳鲲云信息科技有限公司 基于数据流的深度网络加速方法、装置、设备及存储介质
CN111291323B (zh) * 2020-02-17 2023-12-12 南京大学 一种基于脉动阵列的矩阵乘法处理器及其数据处理方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564638A (zh) * 2018-01-15 2019-04-02 深圳鲲云信息科技有限公司 人工智能处理器、及其所应用的处理方法
US20200050476A1 (en) * 2018-08-10 2020-02-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Chip And Instruction Execution Method For Artificial Intelligence Chip
CN110147251A (zh) * 2019-01-28 2019-08-20 腾讯科技(深圳)有限公司 用于计算神经网络模型的架构、芯片及计算方法
CN110991634A (zh) * 2019-12-04 2020-04-10 腾讯科技(深圳)有限公司 人工智能加速器、设备、芯片及数据处理方法
CN111752887A (zh) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429051A (zh) * 2022-04-01 2022-05-03 深圳鲲云信息科技有限公司 数据流芯片的建模方法、装置、设备及介质

Also Published As

Publication number Publication date
CN111752887A (zh) 2020-10-09
US20230281045A1 (en) 2023-09-07
CN111752887B (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
WO2021259104A1 (zh) 人工智能芯片和基于人工智能芯片的数据处理方法
JP5304194B2 (ja) バリア同期装置、バリア同期システム及びバリア同期装置の制御方法
US20110145556A1 (en) Automatic and controllable system operation
WO2019148960A1 (zh) 一种数据分析装置、系统及方法
CN113271264B (zh) 一种时间敏感网络的数据流传输方法和装置
WO2021259041A1 (zh) Ai计算图的排序方法、装置、设备及存储介质
Gao et al. Deep neural network task partitioning and offloading for mobile edge computing
CN104901989A (zh) 一种现场服务提供系统及方法
CN113592066B (zh) 硬件加速方法、装置、设备及存储介质
US20230251979A1 (en) Data processing method and apparatus of ai chip and computer device
CN113422812B (zh) 一种服务链部署方法及装置
CN111211988A (zh) 面向分布式机器学习的数据传输方法及系统
CN113015216A (zh) 一种面向边缘服务网络的突发任务卸载与调度方法
US20230126978A1 (en) Artificial intelligence chip and artificial intelligence chip-based data processing method
WO2024001411A1 (zh) 多线程调度方法及装置
JP5450297B2 (ja) デジタルデータ処理操作を分散実行するためのデバイスおよび方法
CN116782249A (zh) 具有用户依赖关系的边缘计算卸载及资源分配方法及系统
CN114490458B (zh) 数据传输方法、芯片、服务器以及存储介质
WO2021092760A1 (zh) 数据传输的方法和ble设备
CN108388943B (zh) 一种适用于神经网络的池化装置及方法
WO2020062311A1 (zh) 一种内存访问方法及装置
WO2021077281A1 (zh) 深度学习框架的调整方法、装置、服务器及存储介质
WO2021203680A1 (zh) 一种业务流的传输方法、装置、设备及存储介质
WO2020211654A1 (zh) 一种基于行缓冲Linebuffer的并行计算方法及计算设备
US11979295B2 (en) Reinforcement learning agent training method, modal bandwidth resource scheduling method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21830297

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21830297

Country of ref document: EP

Kind code of ref document: A1