CN116881195A - Chip system facing detection calculation and chip method facing detection calculation - Google Patents
Chip system facing detection calculation and chip method facing detection calculation Download PDFInfo
- Publication number
- CN116881195A CN116881195A CN202311130687.6A CN202311130687A CN116881195A CN 116881195 A CN116881195 A CN 116881195A CN 202311130687 A CN202311130687 A CN 202311130687A CN 116881195 A CN116881195 A CN 116881195A
- Authority
- CN
- China
- Prior art keywords
- computing
- computing units
- units
- unit
- active state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000001514 detection method Methods 0.000 title claims abstract description 21
- 238000004364 calculation method Methods 0.000 title abstract description 100
- 238000012549 training Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000003491 array Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 description 17
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Hardware Redundancy (AREA)
Abstract
The application provides a chip system facing detection calculation and a chip method facing detection calculation, comprising the following steps: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; the first computing unit continues to execute the second computing operation after the first computing operation is executed, so that the process of data movement is reduced, and the computing efficiency is improved.
Description
Technical Field
The application relates to the field of artificial intelligent chips, in particular to a chip system facing detection calculation and a chip method facing detection calculation.
Background
In the prior art, the existing artificial intelligent chip schemes are mainly divided into two types, wherein one type is an acceleration processor specially aiming at an artificial neural network, such as a graphic processor of Injeida, a Google tensor processor of Google and the like; the other type is a general-purpose processor, such as a Central Processing Unit (CPU) of a computer, a programmable array logic (FPGA), and the like, and artificial intelligence computation is implemented by software.
In the existing scheme, transmission among calculation data in a chip follows the mapping rule of an original neural network, and a sequential rule calculation mode often brings a plurality of redundant calculation and unnecessary time delay, so that new paths are needed to solve data adjustment and transmission.
Disclosure of Invention
The embodiment of the application aims to provide a chip system facing detection calculation and a chip method facing detection calculation, which are used for realizing the technical effect of improving the calculation efficiency of an artificial intelligent chip.
In a first aspect, an embodiment of the present application provides a detection computation-oriented chip system, including: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets a display substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
In one possible implementation, the detection computation oriented chip system further includes: the active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period; if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
In one possible implementation manner, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of first computing units in an active state, if the number of second computing units required for performing the second computing operation is higher than the number of first computing units in the active state, the control unit decapsulates the temporarily-capped first computing units and makes the number of first computing units in the active state consistent with the number of second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units is used as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second period of time.
In the implementation process, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units, enables the first computing units to be in the active state, preferentially calls the first computing units which can be directly called, directly uses the first computing units if original data of the first computing units are available, transfers the difference value between the first computing units and new data to the first computing units if the original data of the first computing units are unavailable, performs computation on the first computing units until the number of the first computing units in the active state is consistent with the number of the second computing units required for performing the second computing operation, and the control unit modifies the first number to the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second time period. The first calculation unit which can be directly used is preferentially called, the calling time is reduced, and the calculation efficiency is further improved.
In one possible implementation, the method further includes: the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various data to be processed in the database to obtain the number of second computing units required for processing the first computing operation output data.
In the implementation process, the method further comprises: the active computing unit pre-judging unit is internally preset with a database for storing the number of the computing units required by various types of to-be-processed data, the database can be arranged outside a chip system facing detection computing, and the active computing unit pre-judging unit reads first computing operation output data of the first computing unit and compares the first computing operation output data with various types of to-be-processed data in the database so as to obtain the number of second computing units required by processing the first computing operation output data, pre-judge the number of the second computing units required in advance, speed up regulating and controlling the first computing unit and improve computing efficiency.
In one possible implementation, when the plurality of first indication signals are in the continuously active state, the plurality of first computing units are indicated to be in the continuously active state, and when the plurality of first indication signals are in the continuously inactive state, the plurality of first computing units are indicated to be in the continuously inactive state
In the implementation process, one possible indication mode is to set 1bit indication for each of the first computing units and the second computing units, and another possible indication mode is to realize indication of working states of all the first computing units by indicating how many consecutive first computing units are occupied or not occupied, for example, A4U4 is used as a first indication signal to indicate that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. The method for obtaining the pre-judging data comprises the steps of carrying out logic analysis on the existing data, further obtaining the information of the computing units needing to be occupied, for example, deploying the computing units needing to be called in the computing architecture by using a CNN network, carrying out flexible deployment and coordinated calling on the computing units needing to be called, analyzing the computing operation to be processed by the pre-judging unit, and obtaining the required first computing unit expenditure in advance before the deployment of the computing operation.
In one possible implementation, the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
In the implementation process, the first computing unit array and the second computing unit array can be used for executing training tasks or for executing reasoning tasks, and can also be used for executing the training tasks and the reasoning tasks in a crossed manner, so that the use is more flexible.
In a second aspect, an embodiment of the present application provides a chip method for detection computation, including:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period; if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a chip system for detection calculation according to an embodiment of the present application;
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
The embodiment of the application provides a chip system facing detection calculation and a chip method facing detection calculation, please refer to fig. 1, comprising: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets and displays a substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
In one possible implementation, the detection computation oriented chip system further includes: the active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continuously performs the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that a first computing unit-a bus-a Dynamic Random Access Memory (DRAM) or a Static Random Access Memory (SRAM) -a bus-a second computing unit or a first computing unit-a bus-a second computing unit, and the transmission path is longer. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
In one possible implementation manner, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of first computing units in an active state, if the number of second computing units required for performing the second computing operation is higher than the number of first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units and enables the first computing units to be in the active state until the number of first computing units in the active state is consistent with the number of second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second period.
In the implementation process, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units, enables the first computing units to be in the active state, preferentially calls the first computing units which can be directly called, directly uses the first computing units if original data of the first computing units are available, transfers the difference value between the first computing units and new data to the first computing units if the original data of the first computing units are unavailable, performs computation on the first computing units until the number of the first computing units in the active state is consistent with the number of the second computing units required for performing the second computing operation, and the control unit modifies the first number to the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second time period. The first calculation unit which can be directly used is preferentially called, the calling time is reduced, and the calculation efficiency is further improved.
In one possible implementation, the method further includes: the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various types of data to be processed in the database and obtaining the number of second computing units required for processing the first computing operation output data.
In the implementation process, the method further comprises: the active computing unit pre-judging unit is internally preset with a database for storing the number of the computing units required by various types of to-be-processed data, the database can be arranged outside a chip system facing detection computing, and the active computing unit pre-judging unit reads first computing operation output data of the first computing unit and compares the first computing operation output data with various types of to-be-processed data in the database so as to obtain the number of second computing units required by processing the first computing operation output data, pre-judge the number of the second computing units required in advance, speed up regulating and controlling the first computing unit and improve computing efficiency.
In one possible implementation, when the plurality of first indication signals are in the continuously active state, the plurality of first computing units are indicated to be in the continuously active state, and when the plurality of first indication signals are in the continuously inactive state, the plurality of first computing units are indicated to be in the continuously inactive state
In the implementation process, one possible indication mode is to set 1bit indication for each of the first computing units and the second computing units, and another possible indication mode is to realize indication of working states of all the first computing units by indicating how many consecutive first computing units are occupied or not occupied, for example, A4U4 is used as a first indication signal to indicate that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. The method for obtaining the pre-judging data comprises the steps of carrying out logic analysis on the existing data, further obtaining the information of the calculation unit needing to be occupied, for example, deploying the calculation unit in the calculation framework by using a convolutional neural network (CNN network), flexibly deploying and coordinately calling the calculation unit needing to be called, analyzing the calculation operation to be processed by the pre-judging unit, and obtaining the required first calculation unit cost in advance before the deployment of the calculation operation.
In one possible implementation, the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
In the implementation process, the first computing unit array and the second computing unit array can be used for executing training tasks or for executing reasoning tasks, and can also be used for executing the training tasks and the reasoning tasks in a crossed manner, so that the use is more flexible.
In a second aspect, an embodiment of the present application provides a chip method for detection computation, including:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continuously performs the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
Claims (7)
1. A detection-computation-oriented chip system, comprising:
a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time;
a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed;
a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units;
after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets a display substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
2. The detection-computation-oriented chip system of claim 1, further comprising:
The active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit;
after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period;
if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period;
And if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
3. The chip system for detecting and calculating according to claim 2, wherein after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in an active state, the control unit decapsulates the temporarily-stored first computing units and makes the number of the first computing units in an active state until the number of the first computing units in an active state is consistent with the number of the second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units is used as input data of the first computing units, and the first computing units in an active state continue to perform the second computing operation in a second period of time.
4. The detection-computation-oriented chip system of claim 3, further comprising:
the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various data to be processed in the database to obtain the number of second computing units required for processing the first computing operation output data.
5. The detection-computation-oriented chip system of claim 4, wherein when the plurality of first indication signals are in a continuously active state, the plurality of first computation units are indicated as being in a continuously active state, and wherein when the plurality of first indication signals are in a continuously inactive state, the plurality of first computation units are indicated as being in a continuously inactive state.
6. The detection-computation-oriented chip system of claim 1, wherein the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
7. A detection computation oriented chip method, comprising:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit;
After the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period;
if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period;
and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311130687.6A CN116881195B (en) | 2023-09-04 | 2023-09-04 | Chip system facing detection calculation and chip method facing detection calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311130687.6A CN116881195B (en) | 2023-09-04 | 2023-09-04 | Chip system facing detection calculation and chip method facing detection calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116881195A true CN116881195A (en) | 2023-10-13 |
CN116881195B CN116881195B (en) | 2023-11-17 |
Family
ID=88271774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311130687.6A Active CN116881195B (en) | 2023-09-04 | 2023-09-04 | Chip system facing detection calculation and chip method facing detection calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116881195B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140013044A1 (en) * | 2012-07-04 | 2014-01-09 | Hon Hai Precision Industry Co., Ltd. | Computer system having function of detecting working state of memory bank |
CN113517009A (en) * | 2021-06-10 | 2021-10-19 | 上海新氦类脑智能科技有限公司 | Storage and calculation integrated intelligent chip, control method and controller |
CN113792010A (en) * | 2021-09-22 | 2021-12-14 | 清华大学 | Storage and calculation integrated chip and data processing method |
CN116362312A (en) * | 2021-12-23 | 2023-06-30 | 哲库科技(上海)有限公司 | Neural network acceleration device, method, equipment and computer storage medium |
-
2023
- 2023-09-04 CN CN202311130687.6A patent/CN116881195B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140013044A1 (en) * | 2012-07-04 | 2014-01-09 | Hon Hai Precision Industry Co., Ltd. | Computer system having function of detecting working state of memory bank |
CN113517009A (en) * | 2021-06-10 | 2021-10-19 | 上海新氦类脑智能科技有限公司 | Storage and calculation integrated intelligent chip, control method and controller |
CN113792010A (en) * | 2021-09-22 | 2021-12-14 | 清华大学 | Storage and calculation integrated chip and data processing method |
CN116362312A (en) * | 2021-12-23 | 2023-06-30 | 哲库科技(上海)有限公司 | Neural network acceleration device, method, equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116881195B (en) | 2023-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110348574B (en) | ZYNQ-based universal convolutional neural network acceleration structure and design method | |
US20190026626A1 (en) | Neural network accelerator and operation method thereof | |
US11609792B2 (en) | Maximizing resource utilization of neural network computing system | |
CN111832718B (en) | Chip architecture | |
CN109165728B (en) | Basic computing unit and computing method of convolutional neural network | |
CN112711478B (en) | Task processing method and device based on neural network, server and storage medium | |
CN112905530B (en) | On-chip architecture, pooled computing accelerator array, unit and control method | |
US20230297819A1 (en) | Processor array for processing sparse binary neural networks | |
CN113537465B (en) | LSTM model optimization method, accelerator, device and medium | |
US20220147804A1 (en) | Computation unit, related apparatus, and method | |
CN111752879B (en) | Acceleration system, method and storage medium based on convolutional neural network | |
WO2023029632A1 (en) | Model training method and system, and server and chip | |
US20240241808A1 (en) | Application performance test method and apparatus, and method and apparatus for establishing performance test model | |
CN118152980A (en) | Bifurcation operator fusion method, bifurcation operator fusion device, bifurcation operator fusion equipment and bifurcation operator fusion storage medium | |
CN111831354A (en) | Data precision configuration method, device, chip array, equipment and medium | |
CN116881195B (en) | Chip system facing detection calculation and chip method facing detection calculation | |
CN110929856A (en) | Data interaction method and device of NPU and main CPU | |
CN117319373A (en) | Data transmission method, device, electronic equipment and computer readable storage medium | |
CN114239816B (en) | Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network | |
CN109272112B (en) | Data reuse instruction mapping method, system and device for neural network | |
WO2023115529A1 (en) | Data processing method in chip, and chip | |
CN112862079B (en) | Design method of running water type convolution computing architecture and residual error network acceleration system | |
CN113836655B (en) | Fault detection method, medium and system based on ARM-FPGA platform | |
CN117291240B (en) | Convolutional neural network accelerator and electronic device | |
CN118394919B (en) | Method, apparatus, medium and computer program product for generating dialogue model reasoning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |