WO2021259231A1 - 人工智能芯片和基于人工智能芯片的数据处理方法 - Google Patents

人工智能芯片和基于人工智能芯片的数据处理方法 Download PDF

Info

Publication number
WO2021259231A1
WO2021259231A1 PCT/CN2021/101414 CN2021101414W WO2021259231A1 WO 2021259231 A1 WO2021259231 A1 WO 2021259231A1 CN 2021101414 W CN2021101414 W CN 2021101414W WO 2021259231 A1 WO2021259231 A1 WO 2021259231A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
module
calculation
calculation module
processed
Prior art date
Application number
PCT/CN2021/101414
Other languages
English (en)
French (fr)
Inventor
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Publication of WO2021259231A1 publication Critical patent/WO2021259231A1/zh
Priority to US18/069,216 priority Critical patent/US20230126978A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/82Architectures of general purpose stored program computers data or demand driven
    • G06F15/825Dataflow computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the application relate to the field of artificial intelligence technology, for example, to an artificial intelligence chip and a data processing method based on the artificial intelligence chip.
  • AI chips obtain data through instruction sets, and process the data in accordance with the operating rules of the AI algorithm.
  • the embodiments of the present application provide an artificial intelligence chip and a data processing method based on the artificial intelligence chip, so as to improve the resource utilization rate of the AI chip.
  • an embodiment of the present application provides an artificial intelligence chip for a data flow network that processes data to be processed based on an AI algorithm, and the data flow network includes:
  • each calculation module is set to calculate the data to be processed based on one of the at least one calculation node corresponding to the AI algorithm, and output a calculation result;
  • the lower stream conversion module corresponding to each calculation module is configured to be connected to each calculation module, receive calculation results output by each calculation module, and process the calculation results;
  • the data to be processed flows in the data flow network according to a preset data flow direction.
  • an embodiment of the present application provides a data processing method based on an artificial intelligence chip, the method including:
  • Each calculation module in at least one calculation module in the data flow network calculates the data to be processed based on one calculation node in the at least one calculation node corresponding to the AI algorithm, and outputs the calculation result, and the data flow network is used to calculate the data based on the AI algorithm Processing the to-be-processed data;
  • the lower stream transfer module receives the calculation result output by each calculation module and processes the calculation result, wherein the lower stream transfer module is set to be connected to each calculation module;
  • the data to be processed flows in the data flow network according to a preset data flow direction.
  • FIG. 1 is a schematic diagram of the structure of an artificial intelligence chip provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic structural diagram of another artificial intelligence chip provided by an embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of an artificial intelligence chip running a CNN model provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of another artificial intelligence chip provided by an embodiment of the present application.
  • FIG. 4A is a schematic structural diagram of a control flow dam in an artificial intelligence chip provided by an embodiment of the present application.
  • Fig. 5 is a data processing method based on an artificial intelligence chip provided by an embodiment of the present application.
  • first”, second, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the first effective signal may be referred to as the second effective signal
  • the second effective signal may be referred to as the first effective signal.
  • Both the first valid signal and the second valid signal are valid signals, but they are not the same valid signal.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • "a plurality of” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence chip provided by an embodiment of the application.
  • an embodiment of the present application provides an artificial intelligence chip 10, which includes a data flow network for processing data to be processed based on an AI algorithm.
  • the data flow network includes: at least one calculation module 110 and a next Circulation module 120.
  • the artificial intelligence chip of this embodiment is suitable for data processing on the data to be processed based on a preset data flow direction and an AI algorithm. in:
  • the calculation module 110 is configured to calculate the data to be processed based on one of the calculation nodes corresponding to the AI algorithm, and output the calculation result;
  • the next stream conversion module 120 corresponding to the current calculation module 110 is configured to receive the calculation result output by the calculation module 110 and process the calculation result;
  • the data to be processed flows in the data flow network according to a preset data flow direction.
  • the data to be processed refers to data that needs to be processed by an AI algorithm.
  • the data to be processed may be image data to be processed, text data to be processed, and other data that can be processed based on an AI algorithm, and there is no specific limitation here.
  • the AI algorithm refers to the algorithm corresponding to the artificial intelligence model, for example, the algorithm corresponding to the Convolutional Neural Network (CNN) model, etc., and there is no specific limitation here.
  • the calculation node is the node used for calculation in the AI algorithm. It should be noted that the essence of the AI algorithm is some mathematical models, so there are some coefficients. When calculating by the AI algorithm, the corresponding coefficients of the AI algorithm and the data to be processed need to be calculated by the calculation module 110.
  • the current calculation module 110 is one of at least one calculation module 110, and this embodiment does not limit which calculation module 110 the current calculation module 110 is.
  • the data flow direction is characterized in the data flow network according to the operation sequence of the AI algorithm, indicating the flow direction of the data to be processed.
  • the CNN model includes a convolutional layer, a pooling layer, and a fully connected layer.
  • the CNN algorithm is calculated in the convolutional layer, then calculated in the pooling layer, and finally calculated in the fully connected layer, then the computing node can be
  • the node calculated by the convolutional layer, the pooling layer, or the fully connected layer can also be one of the nodes calculated in the convolutional layer, such as the calculation module 110 of the first convolutional sublayer or the second convolutional sublayer, where No specific restrictions.
  • the next stream transfer module 120 refers to the next module connected to the current calculation module 110.
  • the next stream transfer module 120 may be the next calculation module corresponding to the current calculation module 110, or the next storage module corresponding to the current calculation module 110, which can be set as required, and there is no specific limitation here.
  • the number of calculation modules 110 can be determined according to the AI algorithm corresponding to the specific artificial intelligence model, and the data flow can also be determined according to the calculation process of the AI algorithm, that is, it is determined that the data to be processed is in the calculation module 110 and the next stream transfer module 120. There is no specific limitation on the flow between this embodiment.
  • the data to be processed flows in the data stream network according to the preset data flow direction. Neither the calculation module 110 nor the next stream transfer module 120 need to obtain the data, but only needs to wait for the data to arrive at the calculation module 110 according to the data flow direction.
  • the acquired data to be processed can be processed, which reduces instruction overhead and improves the resource utilization rate of the chip.
  • FIG. 2 is a schematic structural diagram of another artificial intelligence chip provided in this embodiment.
  • the data flow network further includes a processing module 130, wherein:
  • the processing module 130 is configured to process the to-be-processed data to obtain the parameters carried by the to-be-processed data;
  • the calculation module 110 is configured to calculate the to-be-processed data based on the parameters.
  • the processing module 130 can be directly connected to the off-chip storage 200 outside the chip, and is set to process the data to be processed after receiving the data to be processed sent by the off-chip storage 200 to obtain the parameters required for data flow network calculations for data flow.
  • the calculation module 110 in the network calculates the to-be-processed data based on the parameters.
  • FIG. 3 is a schematic structural diagram of an artificial intelligence chip running a CNN model provided by this embodiment.
  • the artificial intelligence chip includes a calculation module A 111, a calculation module B 112, and a calculation module C 113.
  • the calculation module A 111 is set to convolution For layer calculation
  • the calculation module B 112 is set to calculate at the pooling layer
  • the calculation module C 113 is set to calculate at the fully connected layer.
  • the preset data flow direction is the calculation module A 111, the calculation module B 112, and the calculation module C 113 in sequence.
  • the image data to be processed flows in the calculation module A 111, the calculation module B 112, and the calculation module C 113 according to the preset data flow direction.
  • the calculation module A 111 the calculation of the convolutional layer is performed.
  • the calculation module B 112 the calculation of the pooling layer
  • the calculation module C 113 the calculation of the fully connected layer, and output the final calculation result.
  • the final calculation results can be stored in off-chip storage outside the artificial intelligence chip, and there is no specific restriction here.
  • the corresponding next transfer module 120 is the calculation module B 112
  • the calculation module C 113 is the current calculation module 110
  • the corresponding next The circulation module 120 is a final storage module that stores the final calculation result (the storage module is not shown in FIGS. 1 and 3).
  • the next transfer module 120 of calculation module A111 is the intermediate storage module. limit. It is understandable that when the previous calculation module 110 has completed the calculation, but the next calculation module 110 has not completed the calculation, the calculation result of the previous calculation module 110 is first sent to the intermediate storage module for waiting, then the previous calculation module 110 can In an idle state, continue to obtain new data for calculation, and when the calculation of the next calculation module 110 is completed, the intermediate storage module sends the calculation result of the previous calculation module 110 to the next calculation module for calculation, which further improves the resource of the chip Utilization rate.
  • previous calculation module 110 and the next calculation module 110 in this embodiment only indicate that there are two calculation modules 110 that interact with each other, and are not limited to a specific calculation module 110.
  • FIG. 4 is a schematic structural diagram of another artificial intelligence chip provided in this embodiment.
  • a control flow dam 140 is provided between the current calculation module 110 and the lower flow transfer module 120, wherein the control flow dam 140 is configured to control the calculation result from the current calculation module 110 to the control flow dam 140.
  • the flow of the first transfer module 120 is described below.
  • Input data rate (F_in) effective input data number/unit time (T_d)
  • the control flow dam 140 combines the internal states of the calculation module 110 and the lower flow transfer module 120 together. It is purely the hardware that determines whether to stream data from the current computing module 110. Therefore, the control flow dam 140 can be understood as a barrier to regulate the data flow. Based on algorithm requirements, the control flow dam 140 is further extended to support predetermined static flow control.
  • control flow dam 140 includes a write end, a read end, a full load end, and an unloaded end, and also includes:
  • the first AND gate is connected to the write terminal to form an uplink valid end, and the uplink valid end is set to receive the first valid signal sent by the current calculation module 110;
  • the second AND gate is connected with the reading terminal to form a downlink permitting terminal, and the downlink permitting terminal is set to receive the second valid signal sent by the downstream transfer module 120;
  • the first NOT gate is connected to the full load terminal to form an uplink permitting terminal, and the first uplink permitting terminal is configured to send a first permit signal to the current calculation module 110 and the first AND gate;
  • the second NOT gate is connected to the no-load terminal to form a downstream valid terminal, and the downstream valid terminal is set to send a second valid signal to the downstream transfer module 120 and the second AND gate.
  • the current calculation module 110 is configured to receive the first permission signal sent by the control flow dam 140;
  • the current calculation module 110 provides the first valid signal to the control flow dam 140 to write the target data in the to-be-processed data into the control flow dam 140, and the current calculation module 110 is set to follow
  • the processing method pointed to by the computing node processes the target data to obtain the calculation result, where the target data is data to be processed and is calculated by the current calculation module 110;
  • the control flow dam 140 is configured to receive the second permission signal sent by the lower flow transfer module 120;
  • the control flow dam 140 provides the second effective signal to the lower flow transfer module 120 to write the calculation result into the lower flow transfer module 120.
  • the current calculation module 110 receives the first permission signal sent by the control flow dam 140, which means that the control flow dam 140 is ready to receive the data that needs to be written in the current calculation module 110, and the current calculation module 110 has received the first permission signal. After controlling the first permission signal sent by the flow dam 140, the current calculation module 110 can read the calculation result. The current calculation module 110 provides the first valid signal to the control flow dam 140, which means that the current calculation module 110 can write the calculation result into the control flow dam 140, and the control flow dam 140 receives the first valid signal sent by the current calculation module 110. Later, the control flow dam 140 can write the calculation result.
  • the current calculation module 110 receives the first permission signal sent by the control flow dam 140, and the control flow dam 140 also receives the first valid signal sent by the current calculation module 110, the calculation result starts to be written into the control from the current calculation module 110 Stream dam 140.
  • any signal stops sending that is, when the control flow dam 140 stops sending the first permission signal to the current calculation module 110 or the current calculation module 110 stops sending the first valid signal to the control flow dam 140, the transmission of the communication will be Stop immediately.
  • the calculation result has been written into the control flow dam 140 from the current calculation module 110, and the control flow dam 140 stores the calculation result.
  • control flow dam 140 When the control flow dam 140 receives the first permission signal sent by the lower flow transfer module 120, it means that the next flow transfer module 120 is ready to receive the data that needs to be written in the control flow dam 140, and the control flow dam 140 receives the next After the second permission signal sent by the transfer module 120, the next stream transfer module 120 can read the calculation result.
  • the control flow dam 140 provides the second valid signal to the lower flow transfer module 120, it means that the control flow dam 140 can write the calculation result into the next flow transfer module 120, and the lower flow transfer module 120 receives the first signal sent by the control flow dam 140. After the second valid signal, the next stream conversion module 120 can write the calculation result.
  • the calculation result starts from the control flow dam 140 Write the next stream transfer module 120.
  • the transmission of the communication is Will stop immediately.
  • the transmission of the calculation result from the current calculation module 110 to the next stream transfer module 120 is completed.
  • the calculation result does not refer to the sequential calculation result, and the calculation result can be any piece of data in actual communication.
  • the artificial intelligence chip includes a data stream network for processing data to be processed based on an AI algorithm
  • the data stream network includes at least one calculation module
  • the calculation module is configured to correspond to the AI algorithm
  • One of the arithmetic nodes calculates the data to be processed and outputs the calculation result
  • the next stream conversion module corresponding to the current calculation module is set to receive the calculation result output by the calculation module and process the calculation result;
  • the data to be processed flows in the data flow network according to the preset data flow direction, which avoids that the AI chip obtains data through the instruction set, which needs to consume the resources of the AI chip to achieve this, resulting in a relatively high resource utilization rate of the AI chip.
  • the low situation improves the resource utilization rate of the AI chip.
  • Figure 5 is an artificial intelligence chip-based data processing method provided by an embodiment of the application, which is applicable to scenarios where data to be processed is processed based on preset data flow directions and AI algorithms. Smart chip implementation.
  • the data processing method based on the artificial intelligence chip includes:
  • At least one calculation module in the data flow network calculates the data to be processed based on one of the calculation nodes corresponding to the AI algorithm, and outputs the calculation result, and the data flow network is configured to process the data to be processed based on the AI algorithm ;
  • the data stream network refers to the network composed of various modules in the artificial intelligence chip for processing the data to be processed based on the AI algorithm.
  • the data to be processed refers to the data that needs to be processed by the AI algorithm.
  • the data to be processed may be image data to be processed, text data to be processed, and other data that can be processed based on an AI algorithm, and there is no specific limitation here.
  • the AI algorithm refers to the algorithm corresponding to the artificial intelligence model, such as the algorithm corresponding to the CNN model, and there is no specific limitation here.
  • the calculation node is the node used for calculation in the AI algorithm. It should be noted that the essence of the AI algorithm is some mathematical models, so there are some coefficients. When calculating through the AI algorithm, the corresponding coefficients of the AI algorithm and the data to be processed need to be calculated by the calculation module.
  • the CNN model includes a convolutional layer, a pooling layer, and a fully connected layer.
  • the CNN algorithm is calculated in the convolutional layer, then calculated in the pooling layer, and finally calculated in the fully connected layer, then the computing node can be
  • the nodes calculated in the convolutional layer, pooling layer, or fully connected layer can also be one of the nodes calculated in the convolutional layer, such as the calculation module of the first convolutional sublayer or the second convolutional sublayer. Specific restrictions.
  • the next stream conversion module corresponding to the current calculation module receives the calculation result output by the calculation module, and processes the calculation result, wherein the data to be processed is in the data flow network according to a preset data flow direction flow.
  • the current calculation module is one of at least one calculation module, and this embodiment does not limit which calculation module the current calculation module is.
  • the next stream module refers to the next module connected to the current computing module.
  • the next stream transfer module may be the next calculation module corresponding to the current calculation module, or the next storage module corresponding to the current calculation module, which can be set as required, and there is no specific limitation here.
  • the number of calculation modules can be determined according to the AI algorithm corresponding to the specific artificial intelligence model, and the data flow direction can also be determined according to the calculation process of the AI algorithm, that is, to determine the data to be processed between the calculation module and the next stream transfer module. Flow is not specifically limited in this embodiment.
  • the data to be processed flows in the data stream network according to the preset data flow direction. Neither the calculation module nor the next stream transfer module needs to obtain the data. It is only necessary to wait for the data to arrive at the calculation module and the next one according to the data flow direction. When the module is transferred, the acquired data to be processed can be processed, which reduces the instruction overhead and improves the resource utilization rate of the chip.
  • the data processing method based on the artificial intelligence chip further includes:
  • the processing module in the data stream network processes the data to be processed to obtain the parameters carried by the data to be processed; the calculation module calculates the data to be processed based on one of the computing nodes corresponding to the AI algorithm, including: determining the data to be processed One of the computing nodes corresponding to the AI algorithm corresponding to the computing module; the computing module calculates the parameters based on the computing node.
  • the processing module can be directly connected to the off-chip storage outside the chip, and is set to process the to-be-processed data after receiving the data to be processed from the off-chip storage to obtain the parameters required by the data flow network calculation for the data flow network
  • the calculation module calculates the to-be-processed data based on the parameter.
  • the target calculation module corresponding to the calculation bottleneck in the data flow network can be set to at least two target calculation sub-modules for serial calculation, or the target calculation module corresponding to the calculation bottleneck in the data flow network can be set to at least two target calculations.
  • Parallel calculation of sub-modules maximizes the resource utilization of the chip.
  • a control flow dam is provided between the current calculation module and the lower flow transfer module, and the artificial intelligence chip-based data processing method further includes: the control flow dam controls the calculation result by the The current calculation module flows to the next stream transfer module.
  • Input data rate (F_in) effective input data number/unit time (T_d)
  • the data dam should be able to store max(F_in)-min(F_out) data.
  • the control flow dam combines the internal state of the calculation module and the downstream flow module. It is purely the hardware decision whether to stream data from the current computing module. Therefore, the control flow dam can be understood as a barrier to regulate the data flow. Based on algorithm requirements, the control flow dam is further extended to support predetermined static flow control.
  • control flow dam includes a write end, a read end, a full load end, and an unloaded end, and also includes a first AND gate, a second AND gate, a first NOT gate, and a second NOT gate.
  • the first AND gate is connected with the write terminal to form an uplink valid terminal
  • the second AND gate is connected with the read terminal to form a downlink permission terminal
  • the first NOT gate is connected with the full load terminal to form an uplink permission
  • the second NOT gate is connected with the no-load end to form a downlink effective end.
  • the artificial intelligence chip-based data processing method further includes: the uplink effective end receives the first effective signal sent by the current computing module;
  • the downstream permitting terminal receives the second valid signal sent by the downstream transfer module;
  • the first upstream permitting terminal sends a first permit signal to the current computing module and the first AND gate to trigger the data flow of the current computing module to the control flow Dam transmission;
  • the downstream valid end sends a second valid signal to the lower flow transfer module and the second AND gate to trigger the data of the current calculation module stored in the control flow dam to be transmitted to the lower flow transfer module.
  • the first valid signal and the first permission signal are used to control the data flow of the current calculation module to the control flow dam, and the second valid signal and the second permission signal are used to control the data flow of the control flow dam to the downstream flow transfer module.
  • the data in the current calculation module flows into the control flow dam and is saved by the control flow dam. When the conditions are met, the control flow dam transfers its saved data to the downstream flow transfer module.
  • the current calculation module provides the first valid signal to the control flow dam to write the target data in the to-be-processed data into the control flow dam, and the current calculation module is set to follow the The processing mode pointed to by the operation node processes the target data to obtain the calculation result.
  • the control flow dam is configured to receive the second permission signal sent by the lower flow transfer module
  • the control flow dam is configured to provide the second effective signal to the next flow transfer module to write the calculation result into the next flow transfer module.
  • the current calculation module receives the first permission signal sent by the control flow dam, which means that the control flow dam is ready to receive the data that needs to be written in the current calculation module, and the current calculation module receives the control flow dam. After the first permission signal, the current calculation module can read the calculation result. The current calculation module provides the first effective signal to the control flow dam, which means that the current calculation module can write the calculation result into the control flow dam. After the control flow dam receives the first effective signal sent by the current calculation module, the control flow dam can Write the calculation result.
  • the calculation result starts to be written into the control flow dam from the current calculation module.
  • the transmission of the communication will stop immediately.
  • the calculation result has been written into the control flow dam from the current calculation module, and the calculation result is stored in the control flow dam.
  • control flow dam When the control flow dam receives the second permission signal sent by the next flow transfer module, it means that the next flow transfer module is ready to receive the data that needs to be written in the control flow dam, and the control flow dam receives the second permission signal sent by the next flow transfer module. After the second permission signal, the next stream transfer module can read the calculation result. When the control flow dam provides the second effective signal to the next flow transfer module, it means that the control flow dam can write the calculation result into the next flow transfer module. After the next flow transfer module receives the second effective signal sent by the control flow dam, the next The first-rate conversion module can write calculation results.
  • the calculation result starts to be written into the next flow transfer from the control flow dam. Module.
  • the transmission of the communication will stop immediately . This completes the transmission of the calculation results from the current calculation module to the next stream conversion module.
  • the calculation result does not refer to the sequential calculation result, and the calculation result can be any piece of data in actual communication.
  • At least one calculation module in the data flow network calculates the data to be processed based on one of the calculation nodes corresponding to the AI algorithm, and outputs the calculation result.
  • the data flow network is used to perform calculations based on the AI algorithm.
  • the data to be processed is processed; the next stream conversion module corresponding to the current calculation module receives the calculation result output by the calculation module, and processes the calculation result, wherein the data to be processed is in accordance with the preset data flow direction
  • the data flow in the network improves the resource utilization rate of the AI chip.
  • Fig. 4A is a schematic structural diagram of a control flow dam in an artificial intelligence chip provided by an embodiment of the present application.
  • the control flow dam includes: an uplink permitting end composed of a first NOT gate 41 and a full load end, an uplink valid end composed of a first AND gate 42 and a write end, a downlink permitting end composed of a second AND gate 43 and a reading end, and The two-not-door 44 and the no-load end constitute the effective downstream end.
  • the control flow dam also includes a storage device, which is set to store data.
  • the uplink permitting terminal sends the first permit signal to the current computing module and the first AND gate 42; the upstream valid terminal receives the first valid signal sent by the current computing module.
  • the event represented by A1 is "current computing module Send the first valid signal to the upstream valid end”
  • the event represented by B1 is “Uplink permitting end sends the first permission signal to the current computing module”
  • the event represented by C1 is "The calculation result of the current computing module is written into the control flow dam” ;
  • the downstream valid end sends the second valid signal to the next stream transfer module and the second AND gate; the downstream permit end receives the second valid signal sent by the next stream transfer module.
  • the event represented by A2 is "downlink The permitting end receives the second permission signal sent by the next stream transfer module"
  • the event represented by B2 is "the downstream valid end sends the second valid signal to the next stream transfer module”
  • the event C2 represents "the next stream transfer module reads the current calculation The calculation result of the module”.
  • the calculation result of the current calculation module 110 can flow to the storage in the control flow dam 140 Store;
  • the lower flow transfer module 120 sends a second permission signal to the control flow dam 140, and the control flow dam 140 sends the second valid signal to the lower flow transfer module 120, the lower flow transfer module 120 reads the stored data in the control flow dam 140 Calculation results.

Abstract

本申请实施例提供了一种人工智能芯片和基于人工智能芯片的数据处理方法。该人工智能芯片包括:用于基于AI算法对待处理数据进行处理的数据流网络,所述数据流网络包括:至少一个计算模块,每个计算模块设置为基于所述AI算法对应的至少一个运算节点中的一个对所述待处理数据进行计算,输出计算结果;所述每个计算模块对应的下一流转模块,设置为与所述每个计算模块连接,接收所述每个计算模块输出的计算结果,并对所述计算结果进行处理;其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。

Description

人工智能芯片和基于人工智能芯片的数据处理方法
本申请要求在2020年6月22日提交中国专利局、申请号为202010575487.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,例如涉及一种人工智能芯片和基于人工智能芯片的数据处理方法。
背景技术
随着人工智能的迅速发展,市场上出现了许多用于计算人工智能学习模型的AI芯片。
目前,常用的人工智能(Artificial Intelligence,AI)芯片是通过指令集的方式获取数据,并按照AI算法的运算规则对数据进行处理。
然而,通过指令集的方式获取数据,需要消耗AI芯片的资源来实现,导致AI芯片的资源利用率较低。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供一种人工智能芯片和基于人工智能芯片的数据处理方法,以提高AI芯片的资源利用率。
第一方面,本申请实施例提供了一种人工智能芯片,用于基于AI算法对待处理数据进行处理的数据流网络,所述数据流网络包括:
至少一个计算模块,每个计算模块设置为基于所述AI算法对应的至少一个运算节点中的一个运算节点对所述待处理数据进行计算,输出计算结果;
所述每个计算模块对应的下一流转模块,设置为与所述每个计算模块连接,接收所述每个计算模块输出的计算结果,并对所述计算结果进行处理;
其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
第二方面,本申请实施例提供了一种基于人工智能芯片的数据处理方法,所述方法包括:
数据流网络中至少一个计算模块中的每个计算模块基于AI算法对应的至少一个运算节点中的一个运算节点对待处理数据进行计算,输出计算结果,所述数据流网络用于基于所述AI算法对所述待处理数据进行处理;
下一流转模块接收所述每个计算模块输出的计算结果,并对所述计算结果进行处理,其中所述下一流转模块设置为与所述每个计算模块连接;
其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
附图说明
图1是本申请实施例一提供的人工智能芯片的结构示意图;
图2是本申请实施例提供的另一种人工智能芯片的结构示意图;
图3是本申请实施例提供的一种运行CNN模型的人工智能芯片的结构示意图;
图4是本申请实施例提供的另一种人工智能芯片的结构示意图;
图4A是本申请实施例提供的一种人工智能芯片中控制流坝的结构示意图;
图5是本申请实施例提供的基于人工智能芯片的数据处理方法。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的示例实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子计算机程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述各种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用 于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一有效信号为第二有效信号,且类似地,可将第二有效信号称为第一有效信号。第一有效信号和第二有效信号两者都是有效信号,但其不是同一有效信号。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
图1为本申请实施例提供的一种人工智能芯片的结构示意图。如图1所示,本申请实施例提供了一种人工智能芯片10,包括用于基于AI算法对待处理数据进行处理的数据流网络,所述数据流网络包括:至少一个计算模块110和下一流转模块120。本实施例的人工智能芯片适用于基于预设的数据流向和AI算法对待处理数据进行数据处理。其中:
所述计算模块110设置为基于所述AI算法对应的其中一个运算节点对所述待处理数据进行计算,输出计算结果;
当前计算模块110对应的下一流转模块120,设置为接收所述计算模块110输出的计算结果,并对所述计算结果进行处理;
其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
在本实施例中,待处理数据是指需要通过AI算法进行处理的数据。例如,待处理数据可以是待处理的图像数据、待处理的文本数据等可以基于AI算法进行处理的数据,此处不作具体限制。AI算法是指人工智能模型对应的算法,例如卷积神经网络(Convolutional Neural Network,CNN)模型对应的算法等,此处不作具体限制。运算节点为AI算法中用于计算的节点。需要说明的是,AI算法的本质是一些数学模型,因此有一些系数,通过AI算法进行计算时,需要AI算法对应系数和待处理数据通过给计算模块110进行计算。当前计算模块110为至少一个计算模块110的其中一个,本实施例不限定当前计算模块110具体为哪个计算模块110。其中,数据流向按照AI算法的运算顺序表征在数据流网络中,指示待处理数据的流动方向。
示例性的,CNN模型包括卷积层、池化层和全连接层,则CNN算法为在卷积层计算,然后在池化层计算,最后在全连接层进行计算,则运算节点可以是在卷积层、池化层或全连接层计算的节点,也可以是在卷积层内计算的其中 一个节点,例如第一卷积子层或第二卷积子层的计算模块110,此处不作具体限制。
其中,下一流转模块120是指与当前计算模块110连接的下一模块。例如,下一流转模块120可以是当前计算模块110对应的下一计算模块,或者是当前计算模块110对应的下一存储模块,可以根据需要设置,此处不作具体限制。需要说明的是,计算模块110的数量可以根据具体的人工智能模型对应的AI算法确定,数据流向也可以根据AI算法的运算过程确定,即确定待处理数据在计算模块110和下一流转模块120之间的流动,本实施例不作具体限制。
可以理解的是,待处理数据是按照预设的数据流向在数据流网络中流动,计算模块110和下一流转模块120均不需要对数据进行获取,只需要等待数据按照数据流向到达计算模块110和下一流转模块120时,对获取到的待处理数据进行处理即可,减少了指令的开销,提高了芯片的资源利用率。
参考图2,图2是本实施例提供的另一种人工智能芯片的结构示意图。在一个实施例中,例如,数据流网络还包括处理模块130,其中:
处理模块130,设置为对所述待处理数据进行处理,得到所述待处理数据所携带的参数;
所述计算模块110设置为基于所述参数对所述待处理数据进行计算。
其中,处理模块130可以与芯片外的片外存储200直接相连,设置为接收片外存储200发送的待处理数据后,对待处理数据进行处理以得到数据流网络计算所需的参数,供数据流网络中的计算模块110基于所述参数对所述待处理数据进行计算。
参考图3,图3为本实施例提供的一种运行CNN模型的人工智能芯片的结构示意图。通过图3可知,本实施例中的待处理数据为待处理的图像数据,人工智能芯片包括计算模块A 111、计算模块B 112和计算模块C 113,其中,计算模块A 111设置为在卷积层计算,计算模块B 112设置为在池化层计算,计算模块C 113设置为在全连接层计算。则预设的数据流向依次为计算模块A 111、计算模块B 112和计算模块C 113。可以理解的是,待处理的图像数据按照预设的数据流向在计算模块A 111、计算模块B 112和计算模块C 113中流动,图像数据到达计算模块A 111时,进行卷积层的计算,计算完成后到达计算模块B 112,进行池化层的计算,最后到达计算模块C 113进行全连接层的计算,输出最后的计算结果。其中,最后的计算结果可以保存在人工智能芯片外的片 外存储中,此处不作具体限制。同时参考图1和图3,当计算模块A 111作为当前计算模块110时,则对应的下一流转模块120为计算模块B 112,当计算模块C 113为当前计算模块110时,对应的下一流转模块120为存储最后的计算结果的最终存储模块(存储模块并未在图1和图3中示出)。
例如,当相邻的两个计算模块110,例如计算模块A111和计算模块B112之间设置有中间存储模块时,则计算模块A111的下一流转模块120则是该中间存储模块,此处不作具体限制。可以理解的是,当上一计算模块110已计算完成,但下一计算模块110未计算完成时,将上一计算模块110的计算结果先发送至中间存储模块等待,则上一计算模块110可以处于空闲状态,继续获取新的数据进行计算,且在下一计算模块110计算完成时,中间存储模块再将上一计算模块110的计算结果发送至下一计算模块进行计算,进一步提高了芯片的资源利用率。
可以理解的是,本实施例的上一计算模块110和下一计算模块110仅表示存在数据交互的两个计算模块110,而不限于具体的计算模块110。
需要说明的是,为了使芯片的资源利用率最大化,数据的流向需要保证计算模块110和下一流转模块120之间刚好没有空闲时间,本实施例不作具体限制。
参考图4,图4是本实施例提供的另一种人工智能芯片的结构示意图。本实施例中的当前计算模块110和所述下一流转模块120之间设置有控制流坝140,其中,所述控制流坝140设置为控制所述计算结果由所述当前计算模块110向所述下一流转模块120流动。
例如,为了实现通过控制流坝140作为计算模块110和下一流转模块120之间的自动流控制,其基本思想如下:
A)输入数据速率(F_in)=有效输入数据数/单位时间(T_d)
B)输出数据速率(F_out)=有效输出数据数/单位时间(T_d)
C)在整个运行期间,如果F_in=F_out,则
为了完全避免背压:数据坝应该能够存储max(F_in)-min(F_out)数据。控制流坝140将计算模块110和下一流转模块120的内部状态结合在一起。纯粹由硬件决定是否将数据从当前计算模块110中流出来。因此,该控制流坝140可以理解为调节数据流的屏障。基于算法要求,控制流坝140进一步扩展为支持预定静态流量控制。
例如,控制流坝140包括写入端、读取端、满载端和空载端,还包括:
第一与门,与所述写入端连接以组成上行有效端,所述上行有效端设置为接收所述当前计算模块110发送的第一有效信号;
第二与门,与所述读取端连接以组成下行许可端,所述下行许可端设置为接收所述下一流转模块120发送的第二有效信号;
第一非门,与所述满载端连接以组成上行许可端,所述第上行许可端设置为发送第一许可信号给所述当前计算模块110和第一与门;
第二非门,与所述空载端连接以组成下行有效端,所述下行有效端设置为发送第二有效信号给所述下一流转模块120和第二与门。
例如,所述当前计算模块110设置为接收所述控制流坝140发送的第一许可信号;
所述当前计算模块110提供所述第一有效信号给所述控制流坝140,以将所述待处理数据中的目标数据写入所述控制流坝140,所述当前计算模块110设置为按照所述运算节点指向的处理方式对所述目标数据进行处理,得到所述计算结果,其中目标数据为待处理数据中,适用当前计算模块110进行计算的数据;
所述控制流坝140设置为接收所述下一流转模块120发送的第二许可信号;
所述控制流坝140提供所述第二有效信号给所述下一流转模块120,以将所述计算结果写入所述下一流转模块120。
本实施例中,当前计算模块110接收到控制流坝140发送的第一许可信号,即表示控制流坝140已准备好接收当前计算模块110中需要写入的数据,在当前计算模块110接收到控制流坝140发送的第一许可信号后,当前计算模块110可以读取计算结果。当前计算模块110给控制流坝140提供第一有效信号,即表示当前计算模块110可以将计算结果写入控制流坝140中,在控制流坝140接收到当前计算模块110发送的第一有效信号后,控制流坝140可以写入计算结果。
当前计算模块110接收到控制流坝140发送的第一许可信号,同时控制流坝140也接收到当前计算模块110发送的第一有效信号时,计算结果便开始从当前计算模块110中写入控制流坝140。其中,当任一信号停止发送时,即控制流坝140停止给当前计算模块110发送第一许可信号或当前计算模块110停止给控制流坝140发送第一有效信号时,该通信的传输将会立即停止。此时, 计算结果已经从当前计算模块110中写入控制流坝140内,控制流坝140中存储有计算结果。当控制流坝140接收到下一流转模块120发送的第一许可信号,即表示下一流转模块120已准备好接收控制流坝140中需要写入的数据,在控制流坝140接收到下一流转模块120发送的第二许可信号后,下一流转模块120可以读取计算结果。当控制流坝140提供第二有效信号给下一流转模块120,即表示控制流坝140可以将计算结果写入下一流转模块120中,在下一流转模块120接收到控制流坝140发送的第二有效信号后,下一流转模块120可以写入计算结果。
当控制流坝140接收到下一流转模块120发送的第一许可信号,同时下一流转模块120也接收到控制流坝140发送的第二有效信号时,计算结果便开始从控制流坝140中写入下一流转模块120。其中,当任一信号停止发送时,即下一流转模块120停止给控制流坝140发送第二许可信号或控制流坝140停止给下一流转模块120发送第二有效信号时,该通信的传输将会立即停止。由此完成计算结果从当前计算模块110到下一流转模块120的传输。另外需要说明的是,计算结果并非指按顺序的计算结果,该计算结果可以为实际通信中的任意一段数据。
本申请实施例的技术方案,人工智能芯片包括用于基于AI算法对待处理数据进行处理的数据流网络,所述数据流网络包括至少一个计算模块,所述计算模块设置为基于所述AI算法对应的其中一个运算节点对所述待处理数据进行计算,输出计算结果;当前计算模块对应的下一流转模块,设置为接收所述计算模块输出的计算结果,并对所述计算结果进行处理;其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动,避免了AI芯片通过指令集的方式获取数据,需要消耗AI芯片的资源来实现,导致AI芯片的资源利用率较低的情况,提高了AI芯片的资源利用率。
图5为本申请实施例提供的一种基于人工智能芯片的数据处理方法,可适用于基于预设的数据流向和AI算法对待处理数据进行数据处理的场景,该方法可以本实施例提供的人工智能芯片实现。
如图5所示,本申请实施例提供的基于人工智能芯片的数据处理方法包括:
S610、数据流网络中的至少一个计算模块基于AI算法对应的其中一个运算节点对待处理数据进行计算,输出计算结果,所述数据流网络用于基于所述AI算法对所述待处理数据进行处理;
其中,数据流网络是指人工智能芯片中,各个模块组成的用于基于AI算法对待处理数据进行处理的网络。待处理数据是指需要通过AI算法进行处理的数据。例如,待处理数据可以是待处理的图像数据、待处理的文本数据等可以基于AI算法进行处理的数据,此处不作具体限制。AI算法是指人工智能模型对应的算法,例如CNN模型对应的算法等,此处不作具体限制。运算节点为AI算法中用于计算的节点。需要说明的是,AI算法的本质是一些数学模型,因此有一些系数,通过AI算法进行计算时,需要AI算法对应系数和待处理数据通过给计算模块进行计算。
示例性的,CNN模型包括卷积层、池化层和全连接层,则CNN算法为在卷积层计算,然后在池化层计算,最后在全连接层进行计算,则运算节点可以是在卷积层、池化层或全连接层计算的节点,也可以是在卷积层内计算的其中一个节点,例如第一卷积子层或第二卷积子层的计算模块,此处不作具体限制。
S620、当前计算模块对应的下一流转模块接收所述计算模块输出的计算结果,并对所述计算结果进行处理,其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
其中,当前计算模块为至少一个计算模块的其中一个,本实施例不限定当前计算模块具体为哪个计算模块。下一流转模块是指与当前计算模块连接的下一模块。例如,下一流转模块可以是当前计算模块对应的下一计算模块,或者是当前计算模块对应的下一存储模块,可以根据需要设置,此处不作具体限制。需要说明的是,计算模块的数量可以根据具体的人工智能模型对应的AI算法确定,数据流向也可以根据AI算法的运算过程确定,即确定待处理数据在计算模块和下一流转模块之间的流动,本实施例不作具体限制。
可以理解的是,待处理数据是按照预设的数据流向在数据流网络中流动,计算模块和下一流转模块均不需要对数据进行获取,只需要等待数据按照数据流向到达计算模块和下一流转模块时,对获取到的待处理数据进行处理即可,减少了指令的开销,提高了芯片的资源利用率。
在一个示例实施方式中,该基于人工智能芯片的数据处理方法还包括:
数据流网络中的处理模块对待处理数据进行处理,得到所述待处理数据所携带的参数;所述计算模块基于AI算法对应的其中一个运算节点对所述待处理数据进行计算,包括:确定所述计算模块对应的AI算法对应的其中一个运算节点;所述计算模块基于所述运算节点对所述参数进行计算。
其中,处理模块可以与芯片外的片外存储直接相连,设置为接收片外存储发送的待处理数据后,对待处理数据进行处理以得到数据流网络计算所需的参数,供数据流网络中的计算模块基于所述参数对所述待处理数据进行计算。
例如,可以将数据流网络中的计算瓶颈对应的目标计算模块设置为至少两个目标计算子模块串行计算,或将数据流网络中的计算瓶颈对应的目标计算模块设置为至少两个目标计算子模块并行计算,使芯片的资源利用率最大化。
在一个示例实施方式中,当前计算模块和所述下一流转模块之间设置有控制流坝,该基于人工智能芯片的数据处理方法还包括:所述控制流坝控制所述计算结果由所述当前计算模块向所述下一流转模块流动。
例如,为了实现通过控制流坝作为计算模块和下一流转模块之间的自动流控制,其基本思想如下:
A)输入数据速率(F_in)=有效输入数据数/单位时间(T_d)
B)输出数据速率(F_out)=有效输出数据数/单位时间(T_d)
C)在整个运行期间,如果F_in=F_out,则
为了完全避免背压:数据坝应该能够存储max(F_in)-min(F_out)数据。控制流坝将计算模块和下一流转模块的内部状态结合在一起。纯粹由硬件决定是否将数据从当前计算模块中流出来。因此,该控制流坝可以理解为调节数据流的屏障。基于算法要求,控制流坝进一步扩展为支持预定静态流量控制。
在一个示例实施方式中,控制流坝包括写入端、读取端、满载端和空载端,还包括第一与门、第二与门、第一非门和第二非门,所述第一与门与所述写入端连接以组成上行有效端,所述第二与门与所述读取端连接以组成下行许可端,第一非门与所述满载端连接以组成上行许可端,第二非门与所述空载端连接以组成下行有效端,基于人工智能芯片的数据处理方法还包括:所述上行有效端接收所述当前计算模块发送的第一有效信号;所述下行许可端接收所述下一流转模块发送的第二有效信号;所述第上行许可端发送第一许可信号给所述当前计算模块和第一与门,以触发当前计算模块的数据向控制流坝传输;所述下行有效端发送第二有效信号给所述下一流转模块和第二与门,以触发控制流坝存储的当前计算模块的数据向下一流转模块传输。第一有效信号和第一许可信号是为了控制当前计算模块流向控制流坝的数据流,第二有效信号和第二许可信号是为了控制控制流坝流向下一流转模块的数据流。当前计算模块中的数据流入控制流坝,并被控制流坝保存,在条件满足时,控制流坝向下一流转模块传 输其保存的数据。
例如,所述当前计算模块提供所述第一有效信号给所述控制流坝,以将所述待处理数据中的目标数据写入所述控制流坝,所述当前计算模块设置为按照所述运算节点指向的处理方式对所述目标数据进行处理,得到所述计算结果。
所述控制流坝设置为接收所述下一流转模块发送的第二许可信号;
所述控制流坝设置为提供所述第二有效信号给所述下一流转模块,以将所述计算结果写入所述下一流转模块。
在本实施方式中,当前计算模块接收到控制流坝发送的第一许可信号,即表示控制流坝已准备好接收当前计算模块中需要写入的数据,在当前计算模块接收到控制流坝发送的第一许可信号后,当前计算模块可以读取计算结果。当前计算模块给控制流坝提供第一有效信号,即表示当前计算模块可以将计算结果写入控制流坝中,在控制流坝接收到当前计算模块发送的第一有效信号后,控制流坝可以写入计算结果。
当前计算模块接收到控制流坝发送的第一许可信号,同时控制流坝也接收到当前计算模块发送的第一有效信号时,计算结果便开始从当前计算模块中写入控制流坝。其中,当任一信号停止发送时,即控制流坝停止给当前计算模块发送第一许可信号或当前计算模块停止给控制流坝发送第一有效信号时,该通信的传输将会立即停止。此时,计算结果已经从当前计算模块中写入控制流坝内,控制流坝中存储有计算结果。当控制流坝接收到下一流转模块发送的第二许可信号,即表示下一流转模块已准备好接收控制流坝中需要写入的数据,在控制流坝接收到下一流转模块发送的第二许可信号后,下一流转模块可以读取计算结果。当控制流坝提供第二有效信号给下一流转模块,即表示控制流坝可以将计算结果写入下一流转模块中,在下一流转模块接收到控制流坝发送的第二有效信号后,下一流转模块可以写入计算结果。
当控制流坝接收到下一流转模块发送的第二许可信号,同时下一流转模块也接收到控制流坝发送的第二有效信号时,计算结果便开始从控制流坝中写入下一流转模块。其中,当任一信号停止发送时,即下一流转模块停止给控制流坝发送第二许可信号或控制流坝停止给下一流转模块发送第二有效信号时,该通信的传输将会立即停止。由此完成计算结果从当前计算模块到下一流转模块的传输。另外需要说明的是,计算结果并非指按顺序的计算结果,该计算结果可以为实际通信中的任意一段数据。
本申请实施例的技术方案,通过数据流网络中的至少一个计算模块基于AI算法对应的其中一个运算节点对待处理数据进行计算,输出计算结果,所述数据流网络用于基于所述AI算法对所述待处理数据进行处理;当前计算模块对应的下一流转模块接收所述计算模块输出的计算结果,并对所述计算结果进行处理,其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动,提高AI芯片的资源利用率。
注意,上述仅为本申请的示例实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。
图4A是本申请实施例提供的一种人工智能芯片中控制流坝的结构示意图。控制流坝包括:第一非门41与满载端组成的上行许可端,第一与门42与写入端组成的上行有效端,第二与门43与读取端组成的下行许可端,第二非门44与空载端组成的下行有效端。控制流坝中还包括贮存器,设置为存储数据。上行许可端发送第一许可信号给当前计算模块和第一与门42;上行有效端接收当前计算模块发送的第一有效信号,在第一与门42中,A1代表的事件为“当前计算模块向上行有效端发送第一有效信号”,B1代表的事件为“上行许可端向当前计算模块发送第一许可信号”,C1代表的事件为“当前计算模块的计算结果写入控制流坝中”;下行有效端发送第二有效信号给下一流转模块和第二与门;下行许可端接收下一流转模块发送的第二有效信号,在第二与门43中,A2代表的事件为“下行许可端接收下一流转模块发送的第二许可信号”,B2代表的事件为“下行有效端向下一流转模块发送第二有效信号”,C2代表的事件为“下一流转模块读取当前计算模块的计算结果”。在当前计算模块110向控制流坝140发送第一有效信号,控制流坝140向当前计算模块110发送第一许可信号时,当前计算模块110的计算结果可以流向控制流坝140中的贮存器中进行存储;在下一流转模块120向控制流坝140发送第二许可信号,控制流坝140向下一流转模块120发送第二有效信号时,下一流转模块120读取控制流坝140中存储的计算结果。

Claims (10)

  1. 一种人工智能AI芯片,包括设置为基于AI算法对待处理数据进行处理的数据流网络,所述数据流网络包括:
    至少一个计算模块,每个计算模块设置为基于所述AI算法对应的至少一个运算节点中的一个对所述待处理数据进行计算,输出计算结果,;
    所述每个计算模块对应的下一流转模块,设置为与所述每个计算模块连接,接收所述每个计算模块输出的计算结果,并对所述计算结果进行处理;
    其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
  2. 如权利要求1所述的人工智能芯片,所述数据流网络还包括:
    处理模块,设置为对所述待处理数据进行处理,得到所述待处理数据所携带的参数;
    所述计算模块设置为基于所述参数对所述待处理数据进行计算。
  3. 如权利要求1所述的人工智能芯片,其中,所述每个计算模块和所述下一流转模块之间设置有控制流坝,所述控制流坝设置为控制所述计算结果由所述每个计算模块流向所述下一流转模块。
  4. 如权利要求3所述的人工智能芯片,其中,所述控制流坝包括写入端、读取端、满载端和空载端,还包括:
    第一与门,与所述写入端连接以组成上行有效端,所述上行有效端设置为接收所述每个计算模块发送的第一有效信号;
    第二与门,与所述读取端连接以组成下行许可端,所述下行许可端设置为接收所述下一流转模块发送的第二许可信号;
    第一非门,与所述满载端连接以组成上行许可端,所述第上行许可端设置为发送第一许可信号给所述每个计算模块和所述第一与门;
    第二非门,与所述空载端连接以组成下行有效端,所述下行有效端设置为发送第二有效信号给所述下一流转模块和第二与门。
  5. 如权利要求4所述的人工智能芯片,其中,所述每个计算模块设置为接收所述控制流坝发送的第一许可信号;
    所述每个计算模块提供所述第一有效信号给所述控制流坝,以将所述待处理数据中的目标数据写入所述控制流坝,所述每个计算模块设置为按照所述运算节点指向的处理方式对所述目标数据进行处理,得到所述计算结果;
    所述控制流坝设置为接收所述下一流转模块发送的第二许可信号;
    所述控制流坝设置为提供所述第二有效信号给所述下一流转模块,以将所述计算结果写入所述下一流转模块。
  6. 一种基于人工智能芯片的数据处理方法,包括:
    数据流网络中至少一个计算模块中的每个计算模块基于AI算法对应的至少一个运算节点中的一个对待处理数据进行计算,输出计算结果,所述数据流网络用于基于所述AI算法对所述待处理数据进行处理;
    所述每个计算模块对应的下一流转模块接收所述每个计算模块输出的计算结果,并对所述计算结果进行处理,其中所述下一流转模块设置为与所述每个计算模块连接;
    其中,所述待处理数据按照预设的数据流向在所述数据流网络中流动。
  7. 如权利要求6所述的方法,还包括:
    数据流网络中的处理模块对待处理数据进行处理,得到所述待处理数据所携带的参数;
    所述每个计算模块基于AI算法对应的至少一个运算节点中的一个对所述待处理数据进行计算,包括:
    确定所述每个计算模块对应的AI算法对应的至少一个运算节点中的一个;
    所述每个计算模块基于所述确定的运算节点对所述参数进行计算。
  8. 如权利要求6所述的方法,其中,所述每个计算模块和所述下一流转模块之间设置有控制流坝,所述方法还包括:
    所述控制流坝控制所述计算结果由所述每个计算模块流向所述下一流转模块。
  9. 如权利要求8所述的方法,其中,所述控制流坝包括写入端、读取端、满载端和空载端,还包括第一与门、第二与门、第一非门和第二非门,所述第一与门与所述写入端连接以组成行有效端,所述第二与门与所述读取端连接以组成下行许可端,第一非门与所述满载端连接以组成上行许可端,第二非门与所述空载端连接以组成下行有效端,所述方法还包括:
    所述上行有效端接收所述每个计算模块发送的第一有效信号;
    所述下行许可端接收所述下一流转模块发送的第二许可信号;
    所述第上行许可端发送第一许可信号给所述每个计算模块和所述第一与门;
    所述下行有效端发送第二有效信号给所述下一流转模块和第二与门。
  10. 如权利要求9所述的方法,其中,所述每个计算模块提供所述第一有 效信号给所述控制流坝,以将所述待处理数据中的目标数据写入所述控制流坝,所述每个计算模块设置为按照所述运算节点指向的处理方式对所述目标数据进行处理,得到所述计算结果;
    所述控制流坝接收所述下一流转模块发送的第二许可信号;
    所述控制流坝提供所述第二有效信号给所述下一流转模块,以将所述计算结果写入所述下一流转模块。
PCT/CN2021/101414 2020-06-22 2021-06-22 人工智能芯片和基于人工智能芯片的数据处理方法 WO2021259231A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/069,216 US20230126978A1 (en) 2020-06-22 2022-12-20 Artificial intelligence chip and artificial intelligence chip-based data processing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010575487.1 2020-06-22
CN202010575487.1A CN111857989B (zh) 2020-06-22 2020-06-22 人工智能芯片和基于人工智能芯片的数据处理方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/069,216 Continuation-In-Part US20230126978A1 (en) 2020-06-22 2022-12-20 Artificial intelligence chip and artificial intelligence chip-based data processing method

Publications (1)

Publication Number Publication Date
WO2021259231A1 true WO2021259231A1 (zh) 2021-12-30

Family

ID=72988037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101414 WO2021259231A1 (zh) 2020-06-22 2021-06-22 人工智能芯片和基于人工智能芯片的数据处理方法

Country Status (3)

Country Link
US (1) US20230126978A1 (zh)
CN (1) CN111857989B (zh)
WO (1) WO2021259231A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857989B (zh) * 2020-06-22 2024-02-27 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704922A (zh) * 2017-04-19 2018-02-16 北京深鉴科技有限公司 人工神经网络处理装置
CN108256640A (zh) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 卷积神经网络实现方法
CN109272112A (zh) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 一种面向神经网络的数据重用指令映射方法、系统及装置
US20190228340A1 (en) * 2017-08-19 2019-07-25 Wave Computing, Inc. Data flow graph computation for machine learning
CN110851779A (zh) * 2019-10-16 2020-02-28 北京航空航天大学 用于稀疏矩阵运算的脉动阵列架构
CN111752887A (zh) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法
CN111857989A (zh) * 2020-06-22 2020-10-30 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564638B (zh) * 2018-01-15 2023-05-26 深圳鲲云信息科技有限公司 人工智能处理器及其所应用的处理方法
CN110046704B (zh) * 2019-04-09 2022-11-08 深圳鲲云信息科技有限公司 基于数据流的深度网络加速方法、装置、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256640A (zh) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 卷积神经网络实现方法
CN107704922A (zh) * 2017-04-19 2018-02-16 北京深鉴科技有限公司 人工神经网络处理装置
US20190228340A1 (en) * 2017-08-19 2019-07-25 Wave Computing, Inc. Data flow graph computation for machine learning
CN109272112A (zh) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 一种面向神经网络的数据重用指令映射方法、系统及装置
CN110851779A (zh) * 2019-10-16 2020-02-28 北京航空航天大学 用于稀疏矩阵运算的脉动阵列架构
CN111752887A (zh) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法
CN111857989A (zh) * 2020-06-22 2020-10-30 深圳鲲云信息科技有限公司 人工智能芯片和基于人工智能芯片的数据处理方法

Also Published As

Publication number Publication date
CN111857989B (zh) 2024-02-27
CN111857989A (zh) 2020-10-30
US20230126978A1 (en) 2023-04-27

Similar Documents

Publication Publication Date Title
KR102519467B1 (ko) 데이터 전처리 방법, 장치, 컴퓨터 설비 및 저장 매체
WO2021259104A1 (zh) 人工智能芯片和基于人工智能芯片的数据处理方法
WO2020187041A1 (zh) 一种基于众核处理器的神经网络的映射方法及计算设备
US10592298B2 (en) Method for distributing load in a multi-core system
WO2017185387A1 (zh) 一种用于执行全连接层神经网络正向运算的装置和方法
CN112004239A (zh) 一种基于云边协同的计算卸载方法及系统
CN109542830B (zh) 一种数据处理系统及数据处理方法
JP7389231B2 (ja) 同期ネットワーク
WO2021259231A1 (zh) 人工智能芯片和基于人工智能芯片的数据处理方法
CN111752879B (zh) 一种基于卷积神经网络的加速系统、方法及存储介质
CN113051199A (zh) 数据传输方法及装置
WO2021259232A1 (zh) Ai芯片的数据处理方法、装置和计算机设备
JP2014235746A (ja) マルチコア装置及びマルチコア装置のジョブスケジューリング方法
Huang et al. Toward decentralized and collaborative deep learning inference for intelligent iot devices
US20230306236A1 (en) Device and method for executing lstm neural network operation
CN111813721B (zh) 神经网络数据处理方法、装置、设备及存储介质
TW202127840A (zh) 初始化晶片上操作
CN116954866A (zh) 基于深度强化学习的边缘云下任务调度方法及系统
CN115150892B (zh) 具有突发性业务的mec无线系统中vm-pm修复策略方法
CN114301911B (zh) 一种基于边边协同的任务管理方法和系统
CN113556242B (zh) 一种基于多处理节点来进行节点间通信的方法和设备
CN112862079B (zh) 一种流水式卷积计算架构设计方法及残差网络加速系统
CN111860821B (zh) 数据流架构神经网络芯片的数据传输的控制方法及系统
CN115668222A (zh) 一种神经网络的数据处理方法及装置
WO2021077284A1 (zh) 神经网络运行系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21828528

Country of ref document: EP

Kind code of ref document: A1