CN116881195A - Chip system facing detection calculation and chip method facing detection calculation - Google Patents

Chip system facing detection calculation and chip method facing detection calculation Download PDF

Info

Publication number
CN116881195A
CN116881195A CN202311130687.6A CN202311130687A CN116881195A CN 116881195 A CN116881195 A CN 116881195A CN 202311130687 A CN202311130687 A CN 202311130687A CN 116881195 A CN116881195 A CN 116881195A
Authority
CN
China
Prior art keywords
computing
computing units
units
unit
active state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311130687.6A
Other languages
Chinese (zh)
Other versions
CN116881195B (en
Inventor
李慧清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wisemays Technology Co ltd
Original Assignee
Beijing Wisemays Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wisemays Technology Co ltd filed Critical Beijing Wisemays Technology Co ltd
Priority to CN202311130687.6A priority Critical patent/CN116881195B/en
Publication of CN116881195A publication Critical patent/CN116881195A/en
Application granted granted Critical
Publication of CN116881195B publication Critical patent/CN116881195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application provides a chip system facing detection calculation and a chip method facing detection calculation, comprising the following steps: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; the first computing unit continues to execute the second computing operation after the first computing operation is executed, so that the process of data movement is reduced, and the computing efficiency is improved.

Description

Chip system facing detection calculation and chip method facing detection calculation
Technical Field
The application relates to the field of artificial intelligent chips, in particular to a chip system facing detection calculation and a chip method facing detection calculation.
Background
In the prior art, the existing artificial intelligent chip schemes are mainly divided into two types, wherein one type is an acceleration processor specially aiming at an artificial neural network, such as a graphic processor of Injeida, a Google tensor processor of Google and the like; the other type is a general-purpose processor, such as a Central Processing Unit (CPU) of a computer, a programmable array logic (FPGA), and the like, and artificial intelligence computation is implemented by software.
In the existing scheme, transmission among calculation data in a chip follows the mapping rule of an original neural network, and a sequential rule calculation mode often brings a plurality of redundant calculation and unnecessary time delay, so that new paths are needed to solve data adjustment and transmission.
Disclosure of Invention
The embodiment of the application aims to provide a chip system facing detection calculation and a chip method facing detection calculation, which are used for realizing the technical effect of improving the calculation efficiency of an artificial intelligent chip.
In a first aspect, an embodiment of the present application provides a detection computation-oriented chip system, including: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets a display substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
In one possible implementation, the detection computation oriented chip system further includes: the active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period; if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
In one possible implementation manner, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of first computing units in an active state, if the number of second computing units required for performing the second computing operation is higher than the number of first computing units in the active state, the control unit decapsulates the temporarily-capped first computing units and makes the number of first computing units in the active state consistent with the number of second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units is used as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second period of time.
In the implementation process, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units, enables the first computing units to be in the active state, preferentially calls the first computing units which can be directly called, directly uses the first computing units if original data of the first computing units are available, transfers the difference value between the first computing units and new data to the first computing units if the original data of the first computing units are unavailable, performs computation on the first computing units until the number of the first computing units in the active state is consistent with the number of the second computing units required for performing the second computing operation, and the control unit modifies the first number to the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second time period. The first calculation unit which can be directly used is preferentially called, the calling time is reduced, and the calculation efficiency is further improved.
In one possible implementation, the method further includes: the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various data to be processed in the database to obtain the number of second computing units required for processing the first computing operation output data.
In the implementation process, the method further comprises: the active computing unit pre-judging unit is internally preset with a database for storing the number of the computing units required by various types of to-be-processed data, the database can be arranged outside a chip system facing detection computing, and the active computing unit pre-judging unit reads first computing operation output data of the first computing unit and compares the first computing operation output data with various types of to-be-processed data in the database so as to obtain the number of second computing units required by processing the first computing operation output data, pre-judge the number of the second computing units required in advance, speed up regulating and controlling the first computing unit and improve computing efficiency.
In one possible implementation, when the plurality of first indication signals are in the continuously active state, the plurality of first computing units are indicated to be in the continuously active state, and when the plurality of first indication signals are in the continuously inactive state, the plurality of first computing units are indicated to be in the continuously inactive state
In the implementation process, one possible indication mode is to set 1bit indication for each of the first computing units and the second computing units, and another possible indication mode is to realize indication of working states of all the first computing units by indicating how many consecutive first computing units are occupied or not occupied, for example, A4U4 is used as a first indication signal to indicate that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. The method for obtaining the pre-judging data comprises the steps of carrying out logic analysis on the existing data, further obtaining the information of the computing units needing to be occupied, for example, deploying the computing units needing to be called in the computing architecture by using a CNN network, carrying out flexible deployment and coordinated calling on the computing units needing to be called, analyzing the computing operation to be processed by the pre-judging unit, and obtaining the required first computing unit expenditure in advance before the deployment of the computing operation.
In one possible implementation, the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
In the implementation process, the first computing unit array and the second computing unit array can be used for executing training tasks or for executing reasoning tasks, and can also be used for executing the training tasks and the reasoning tasks in a crossed manner, so that the use is more flexible.
In a second aspect, an embodiment of the present application provides a chip method for detection computation, including:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period; if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a chip system for detection calculation according to an embodiment of the present application;
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
The embodiment of the application provides a chip system facing detection calculation and a chip method facing detection calculation, please refer to fig. 1, comprising: a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time; a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed; a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units; after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets and displays a substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
In one possible implementation, the detection computation oriented chip system further includes: the active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continuously performs the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that a first computing unit-a bus-a Dynamic Random Access Memory (DRAM) or a Static Random Access Memory (SRAM) -a bus-a second computing unit or a first computing unit-a bus-a second computing unit, and the transmission path is longer. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.
In one possible implementation manner, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of first computing units in an active state, if the number of second computing units required for performing the second computing operation is higher than the number of first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units and enables the first computing units to be in the active state until the number of first computing units in the active state is consistent with the number of second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second period.
In the implementation process, after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in the active state, the control unit decapsulates the temporarily-stored first computing units, enables the first computing units to be in the active state, preferentially calls the first computing units which can be directly called, directly uses the first computing units if original data of the first computing units are available, transfers the difference value between the first computing units and new data to the first computing units if the original data of the first computing units are unavailable, performs computation on the first computing units until the number of the first computing units in the active state is consistent with the number of the second computing units required for performing the second computing operation, and the control unit modifies the first number to the second number, the output data of the first computing units serves as input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in the second time period. The first calculation unit which can be directly used is preferentially called, the calling time is reduced, and the calculation efficiency is further improved.
In one possible implementation, the method further includes: the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various types of data to be processed in the database and obtaining the number of second computing units required for processing the first computing operation output data.
In the implementation process, the method further comprises: the active computing unit pre-judging unit is internally preset with a database for storing the number of the computing units required by various types of to-be-processed data, the database can be arranged outside a chip system facing detection computing, and the active computing unit pre-judging unit reads first computing operation output data of the first computing unit and compares the first computing operation output data with various types of to-be-processed data in the database so as to obtain the number of second computing units required by processing the first computing operation output data, pre-judge the number of the second computing units required in advance, speed up regulating and controlling the first computing unit and improve computing efficiency.
In one possible implementation, when the plurality of first indication signals are in the continuously active state, the plurality of first computing units are indicated to be in the continuously active state, and when the plurality of first indication signals are in the continuously inactive state, the plurality of first computing units are indicated to be in the continuously inactive state
In the implementation process, one possible indication mode is to set 1bit indication for each of the first computing units and the second computing units, and another possible indication mode is to realize indication of working states of all the first computing units by indicating how many consecutive first computing units are occupied or not occupied, for example, A4U4 is used as a first indication signal to indicate that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. The method for obtaining the pre-judging data comprises the steps of carrying out logic analysis on the existing data, further obtaining the information of the calculation unit needing to be occupied, for example, deploying the calculation unit in the calculation framework by using a convolutional neural network (CNN network), flexibly deploying and coordinately calling the calculation unit needing to be called, analyzing the calculation operation to be processed by the pre-judging unit, and obtaining the required first calculation unit cost in advance before the deployment of the calculation operation.
In one possible implementation, the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
In the implementation process, the first computing unit array and the second computing unit array can be used for executing training tasks or for executing reasoning tasks, and can also be used for executing the training tasks and the reasoning tasks in a crossed manner, so that the use is more flexible.
In a second aspect, an embodiment of the present application provides a chip method for detection computation, including:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit; after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continuously performs the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
In the implementation process, the first computing unit array and the second computing unit array respectively include a first computing unit and a second computing unit, and are used for respectively executing a first computing operation and a second computing operation, the second computing operation is located after the first computing operation in the second time period, the first computing unit is provided with a first number, and the second computing unit is provided with a second number. The control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units. The active computing unit tracking unit judges whether the first computing unit and the second computing unit are in an active state according to the first indicating signals of the first computing units and the second indicating signals of the second computing unit, and one feasible indicating mode is to realize the indication of the working states of all the first computing units by indicating how many continuous first computing units are occupied or not occupied, for example, A4U4 is used as a first indicating signal and can be used for indicating that the first four first computing units are occupied and activated, the second four first computing units are not occupied, A2U4A2 indicates that the first two first computing units are in an occupied and activated state, the middle four first computing units are in a non-occupied state, and the last two first computing units are in an activated state. After the execution of the first computing operation is completed, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in the active state continues to execute the second computing operation in a second time period. If the number of the second computing units required for executing the second computing operation is lower than the number of the first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period; if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serve as input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period. The locations of the computational units used may be different between the different layers, which may result in meaningless movement of the data, since the unnecessary computational data itself has been removed. The traditional data transmission mode is that the transmission path is longer, and the data transmission mode is that a first calculation unit is a bus, a DRAM or an SRAM is a bus, a second calculation unit or a first calculation unit is a bus, a second calculation unit. In the scheme, disordered operation is adopted, output data after the calculation of the first calculation unit is used as input data of the first calculation unit, namely the data transmission mode is the first calculation unit, namely the first calculation unit is used for transmitting the data only in the first calculation unit, the system considers the first calculation unit as the second calculation unit by changing the virtual number of the first calculation unit, data flow is not carried out on the second calculation unit, meanwhile, the first calculation unit can be combined with the control unit to transmit required calculation parameters to complete calculation, the above operation is circulated until the calculation is completed, the data moving process is reduced, and the calculation efficiency is improved.

Claims (7)

1. A detection-computation-oriented chip system, comprising:
a first computing unit array including a plurality of first computing units of a first number for performing a first computing operation during a first period of time;
a second computing unit array including a plurality of second numbered second computing units for performing a second computing operation for a second period of time, the second period of time after the first period of time, the second computing operation being performed after the first computing operation is completed;
a control unit for judging the number of first computing units required for performing the first computing operation and the number of second computing units required for performing the second computing operation, and numbering the first computing units and the second computing units;
after the first computing unit array performs the first computing operation, the control unit modifies the first number into the second number, and sets a display substitute operation identifier in the control unit, the output data of the first computing unit is used as the input data of the first computing unit, and the first computing unit in an active state continues to perform the second computing operation in a second time period.
2. The detection-computation-oriented chip system of claim 1, further comprising:
The active computing unit tracking unit judges whether the first computing unit is in an active state or not according to the first indication signal of the first computing unit, and judges whether the second computing unit is in an active state or not according to the second indication signal of the second computing unit;
after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period;
if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period;
And if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
3. The chip system for detecting and calculating according to claim 2, wherein after the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is higher than the number of the first computing units in an active state, the control unit decapsulates the temporarily-stored first computing units and makes the number of the first computing units in an active state until the number of the first computing units in an active state is consistent with the number of the second computing units required for the second computing operation, the control unit modifies the first number into the second number, the output data of the first computing units is used as input data of the first computing units, and the first computing units in an active state continue to perform the second computing operation in a second period of time.
4. The detection-computation-oriented chip system of claim 3, further comprising:
the active computing unit pre-judging unit is used for comparing the first computing operation output data of the first computing unit with various data to be processed in the database to obtain the number of second computing units required for processing the first computing operation output data.
5. The detection-computation-oriented chip system of claim 4, wherein when the plurality of first indication signals are in a continuously active state, the plurality of first computation units are indicated as being in a continuously active state, and wherein when the plurality of first indication signals are in a continuously inactive state, the plurality of first computation units are indicated as being in a continuously inactive state.
6. The detection-computation-oriented chip system of claim 1, wherein the first and second arrays of computing units are used to perform training tasks or reasoning tasks.
7. A detection computation oriented chip method, comprising:
a plurality of first numbered first computing units are arranged in the first computing unit array and used for executing first computing operation in a first time period, a plurality of second numbered second computing units are arranged in the second computing unit array and used for executing second computing operation in a second time period, and the second computing operation is executed after the first computing operation is completed in the second time period; the control unit is used for judging the number of first computing units required for executing the first computing operation and the number of second computing units required for executing the second computing operation, and numbering the first computing units and the second computing units; setting an active computing unit tracking unit, judging whether the first computing unit is in an active state or not according to a first indication signal of the first computing unit, and judging whether the second computing unit is in an active state or not according to a second indication signal of the second computing unit;
After the first computing unit array performs the first computing operation, the active computing unit tracking unit detects the number of the first computing units in an active state, if the number of the second computing units required for performing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units is used as the input data of the first computing units, and the first computing units in the active state continue to perform the second computing operation in a second time period;
if the number of second computing units required for executing the second computing operation is lower than the number of first computing units in an active state, the control unit temporarily seals the redundant first computing units, the control unit modifies the first number into the second number, the output data of the first computing units are used as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period;
and if the number of the second computing units required for executing the second computing operation is higher than the number of the first computing units in the active state, the control unit calls the idle first computing units into the active state until the number of the second computing units required for executing the second computing operation is consistent with the number of the first computing units in the active state, the control unit modifies the first number into the second number, the output data of the first computing units serves as the input data of the first computing units, and the first computing units in the active state continue to execute the second computing operation in a second time period.
CN202311130687.6A 2023-09-04 2023-09-04 Chip system facing detection calculation and chip method facing detection calculation Active CN116881195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311130687.6A CN116881195B (en) 2023-09-04 2023-09-04 Chip system facing detection calculation and chip method facing detection calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311130687.6A CN116881195B (en) 2023-09-04 2023-09-04 Chip system facing detection calculation and chip method facing detection calculation

Publications (2)

Publication Number Publication Date
CN116881195A true CN116881195A (en) 2023-10-13
CN116881195B CN116881195B (en) 2023-11-17

Family

ID=88271774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311130687.6A Active CN116881195B (en) 2023-09-04 2023-09-04 Chip system facing detection calculation and chip method facing detection calculation

Country Status (1)

Country Link
CN (1) CN116881195B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140013044A1 (en) * 2012-07-04 2014-01-09 Hon Hai Precision Industry Co., Ltd. Computer system having function of detecting working state of memory bank
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN113792010A (en) * 2021-09-22 2021-12-14 清华大学 Storage and calculation integrated chip and data processing method
CN116362312A (en) * 2021-12-23 2023-06-30 哲库科技(上海)有限公司 Neural network acceleration device, method, equipment and computer storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140013044A1 (en) * 2012-07-04 2014-01-09 Hon Hai Precision Industry Co., Ltd. Computer system having function of detecting working state of memory bank
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN113792010A (en) * 2021-09-22 2021-12-14 清华大学 Storage and calculation integrated chip and data processing method
CN116362312A (en) * 2021-12-23 2023-06-30 哲库科技(上海)有限公司 Neural network acceleration device, method, equipment and computer storage medium

Also Published As

Publication number Publication date
CN116881195B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN110348574B (en) ZYNQ-based universal convolutional neural network acceleration structure and design method
US20190026626A1 (en) Neural network accelerator and operation method thereof
US11609792B2 (en) Maximizing resource utilization of neural network computing system
CN111832718B (en) Chip architecture
CN109165728B (en) Basic computing unit and computing method of convolutional neural network
CN112711478B (en) Task processing method and device based on neural network, server and storage medium
CN112905530B (en) On-chip architecture, pooled computing accelerator array, unit and control method
US20230297819A1 (en) Processor array for processing sparse binary neural networks
CN113537465B (en) LSTM model optimization method, accelerator, device and medium
US20220147804A1 (en) Computation unit, related apparatus, and method
CN111752879B (en) Acceleration system, method and storage medium based on convolutional neural network
WO2023029632A1 (en) Model training method and system, and server and chip
US20240241808A1 (en) Application performance test method and apparatus, and method and apparatus for establishing performance test model
CN118152980A (en) Bifurcation operator fusion method, bifurcation operator fusion device, bifurcation operator fusion equipment and bifurcation operator fusion storage medium
CN111831354A (en) Data precision configuration method, device, chip array, equipment and medium
CN116881195B (en) Chip system facing detection calculation and chip method facing detection calculation
CN110929856A (en) Data interaction method and device of NPU and main CPU
CN117319373A (en) Data transmission method, device, electronic equipment and computer readable storage medium
CN114239816B (en) Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network
CN109272112B (en) Data reuse instruction mapping method, system and device for neural network
WO2023115529A1 (en) Data processing method in chip, and chip
CN112862079B (en) Design method of running water type convolution computing architecture and residual error network acceleration system
CN113836655B (en) Fault detection method, medium and system based on ARM-FPGA platform
CN117291240B (en) Convolutional neural network accelerator and electronic device
CN118394919B (en) Method, apparatus, medium and computer program product for generating dialogue model reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant