CN112947892A - Calculation method and chip of two-norm regular term - Google Patents

Calculation method and chip of two-norm regular term Download PDF

Info

Publication number
CN112947892A
CN112947892A CN202110519691.6A CN202110519691A CN112947892A CN 112947892 A CN112947892 A CN 112947892A CN 202110519691 A CN202110519691 A CN 202110519691A CN 112947892 A CN112947892 A CN 112947892A
Authority
CN
China
Prior art keywords
norm
module
square index
regular term
index value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110519691.6A
Other languages
Chinese (zh)
Other versions
CN112947892B (en
Inventor
侯东伯
朱剑丘
沈大框
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Suiyuan Intelligent Technology Co ltd
Original Assignee
Beijing Suiyuan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Suiyuan Intelligent Technology Co ltd filed Critical Beijing Suiyuan Intelligent Technology Co ltd
Priority to CN202110519691.6A priority Critical patent/CN112947892B/en
Publication of CN112947892A publication Critical patent/CN112947892A/en
Application granted granted Critical
Publication of CN112947892B publication Critical patent/CN112947892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/556Logarithmic or exponential functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a method and a chip for calculating a two-norm regular term. The method is applied to a computing module in a chip and comprises the following steps: when detecting that a DMA module of a chip accesses memory data, reading a write data bus of the DMA module to obtain target data; calculating a square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to each square index value, correcting the second-norm regular term which does not meet a second detection condition, and generating a state indication signal; and sending the two-norm regular term and each state indication signal to a result analysis module in the chip for analysis. According to the technical scheme of the embodiment of the invention, the two-norm regular term of the memory data is automatically calculated by using the hardware resource of the chip in the process of accessing the memory data by the DMA module of the chip, and the influence on the performance of a network model is reduced.

Description

Calculation method and chip of two-norm regular term
Technical Field
The embodiment of the invention relates to the technical field of computer chips, in particular to a method and a chip for calculating a two-norm regular term.
Background
At present, a deep learning network model often needs to use a two-norm regularization function to calculate tensors of calculation nodes, weights and gradients in the model into constant values to judge the correctness of calculation results. In the prior art, a new computing node is generally added in a script execution program for building a network to compute a two-norm regular term of data, but the method has the following defects: (1) because a large number of additional computing operations are added, the performance of the whole computing network is seriously reduced. (2) The topology of the native model computation graph is changed, which may cause the problem needing debugging to be unable to be reproduced. (3) And additional storage space is added, and some super-large models cannot be debugged under a critical state.
Disclosure of Invention
The embodiment of the invention provides a method and a chip for calculating a two-norm regular term, which are used for automatically calculating the two-norm regular term of Memory data by using hardware resources of the chip in the process of accessing the Memory data by a Direct Memory Access (DMA) module of the chip, and reducing the performance influence on a network model.
In a first aspect, an embodiment of the present invention provides a method for calculating a two-norm regularization term, which is applied to a calculation module in a chip, and includes:
when detecting that a Direct Memory Access (DMA) module in a chip accesses memory data, reading a data writing bus of the DMA module to obtain target data to be calculated;
calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function;
if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value;
calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not accord with a second detection condition;
and sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.
Optionally, before detecting that the DMA module in the chip accesses the memory data, the method further includes:
optimizing a two-norm regularization function to
Figure 100002_DEST_PATH_IMAGE001
Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,
Figure 641979DEST_PATH_IMAGE002
l2 is a two-norm regular term for the data to be calculated, which is a square exponential calculation formula for each element.
Optionally, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function, including:
calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element;
and respectively calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware according to the square exponent calculation formula and the exponent of each element.
Optionally, if a target square index value that does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value, including:
aiming at the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working;
if the square index value is smaller than the preset lower limit value of the effective range, generating an underflow signal corresponding to the square index value, and correcting the square index value into the lower limit value of the effective range;
and if the square index value is larger than the preset upper limit value of the effective range, generating an overflow signal corresponding to the square index value, and correcting the square index value into the upper limit value of the effective range.
Optionally, the calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not meet a second detection condition, includes:
calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of target data;
and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term into the two-norm upper limit value minus 1.
Optionally, in the process of calculating and accumulating the two-norm regularization values corresponding to the square index values of the elements to obtain the two-norm regularization term of the target data, the method further includes:
counting the accumulation times of the two-norm regularization values of each element through a hardware counter;
and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.
Optionally, the corrected two-norm regular term and each state indication signal are sent to a result analysis module in the chip for analysis, where the analysis includes:
and sending the two-norm regular term, the accumulation times of the two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.
Optionally, before detecting that the DMA module in the chip accesses the memory data, the method further includes:
responding to the parameter configuration operation of a register in the DMA module to the calculation module, and acquiring the configuration parameters of the calculation module;
the register performs parameter configuration operation on the calculation module according to the memory access command packet received by the DMA module; the configuration parameters include: enabling the calculation of the two-norm regular term, sending the two-norm regular term to a result analysis module to enable, the data type of the data to be calculated, the basic value of the sliding window and the overflow threshold.
Optionally, after the obtaining the configuration parameters of the computing module in response to the parameter configuration operation of the computing module by the register in the DMA module, the method further includes:
executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module;
the relevant data of the two-norm regular term includes: the target data comprises a two-norm regular term, accumulation times of two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term.
In a second aspect, an embodiment of the present invention further provides a computing chip with a two-norm regular term, where the chip includes: the DMA module, the calculation module and the result analysis module;
the DMA module is used for configuring and indicating a register in the DMA module to perform parameter configuration on the calculation module according to the received memory access command packet and accessing specified memory data according to the memory access command packet;
a calculation module for performing a calculation method of the two-norm regularization term of any one of claims 1 to 9;
and the result analysis module is used for receiving the two-norm regular term and each state indication signal sent by the calculation module and carrying out abnormal analysis on the two-norm regular term and each state indication signal.
The technical scheme of the embodiment of the invention is applied to a computing module in a chip, and target data to be computed is obtained by reading a data writing bus of a DMA (direct memory access) module when the DMA module of the chip is detected to access memory data; calculating the square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a corresponding state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to the corrected square index value of each element, correcting the second-norm regular term which does not meet a second detection condition, and generating a corresponding state indication signal; the two-norm regular term and the state indication signals are sent to the result analysis module in the chip for analysis, the problem that the network performance is reduced due to the fact that the calculation node is added in a program to calculate the two-norm regular term in the prior art is solved, the two-norm regular term of the memory data is automatically calculated by using hardware resources of the chip in the process that a DMA (direct memory access) module of the chip accesses the memory data, and the performance influence on a network model is reduced.
Drawings
FIG. 1a is a flowchart of a method for calculating a two-norm regularization term according to a first embodiment of the present invention;
FIG. 1b is a diagram illustrating the structure of a floating-point FP32 according to the IEEE 754 standard in the first embodiment of the present invention;
FIG. 2 is a flowchart of a method for calculating a two-norm regularization term according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computing chip with a two-norm regularization term according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1a is a flowchart of a method for calculating a two-norm regularization term in an embodiment of the present invention, which is applicable to a case where hardware resources are used to automatically calculate a two-norm regularization term of memory data, and the method can be executed by a calculation module in a chip. As shown in fig. 1a, the method is applied to a computing module in a chip, and includes:
and step 110, when detecting that the DMA module in the chip accesses the memory data, reading a write data bus of the DMA module to obtain target data to be calculated.
The DMA module in the chip is used for copying data from one address space to another address space and providing high-speed data transmission between the peripheral and the memory or between the memory and the memory. In order to calculate the two-norm regular term of the memory data by using hardware resources in the process of carrying the memory data by using a DMA (direct memory access) module of a chip, a calculation module is added in the chip, so that the two-norm regular term of the memory data can be calculated by using the calculation module without participating in operation by using other modules in the chip, thereby not affecting data transmission between the DMA module and the memory and having no influence on the performance of the chip.
In this embodiment, when the computing module detects that the DMA module transfers data between the memories, for example, when the DMA module writes the output data of the deep learning model into the memory for storage, the computing module may read the write data bus of the DMA module, and use an effective element of the write data bus as input data of the computing module, that is, target data to be computed, in a downsampling manner.
The purpose of down-sampling the write data bus is to reduce the amount of computation, among other things. For the convenience of hardware operation, the least significant element of the write data bus can be used as input data of the computation module by downsampling.
Optionally, before detecting that the DMA module in the chip accesses the memory data, the method may further include: optimizing a two-norm regularization function to
Figure 27961DEST_PATH_IMAGE001
(ii) a Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,
Figure 401174DEST_PATH_IMAGE002
l2 is a two-norm regular term for the data to be calculated, which is a square exponential calculation formula for each element.
In this embodiment, in order to reduce hardware for calculating the two-norm regularization termImplementation overhead, regularization function to the traditional two-norm
Figure DEST_PATH_IMAGE003
Performing derivation analysis to obtain an optimized two-norm regularization function
Figure 373547DEST_PATH_IMAGE001
The function optimization derivation process is as follows:
regularization function for conventional two-norm
Figure 656761DEST_PATH_IMAGE003
Figure 213644DEST_PATH_IMAGE004
For any vector element in the input vector V, since 0.5 is a constant coefficient, it is possible to make
Figure DEST_PATH_IMAGE005
. Suppose that
Figure 542994DEST_PATH_IMAGE004
Is FP32 floating point number, find
Figure 351681DEST_PATH_IMAGE006
I.e. to find a certain vector element
Figure DEST_PATH_IMAGE007
. Taking the representation method of FP32 floating point under IEEE 754 standard shown in fig. 1b as an example,
Figure 223822DEST_PATH_IMAGE004
can be expressed as
Figure 76241DEST_PATH_IMAGE008
Figure 564991DEST_PATH_IMAGE004
The exponent part e of (2) is 8 bits in total, and when e is a positive integer, the numerical range is
Figure DEST_PATH_IMAGE009
Numerical range of the index
Figure 879166DEST_PATH_IMAGE010
. In calculating the sum of squares of all elements
Figure 871393DEST_PATH_IMAGE006
In order to adapt to different data types, only the exponent part of an element is squared, while the mantissa part of the element is discarded, and at the same time, because of the squaring, the sign bit of the element is always 0. Therefore, the temperature of the molten metal is controlled,
Figure DEST_PATH_IMAGE011
wherein
Figure 363554DEST_PATH_IMAGE012
. Therefore, the temperature of the molten metal is controlled,
Figure DEST_PATH_IMAGE013
order to
Figure 949388DEST_PATH_IMAGE014
Then L2_ Norm is optimized to
Figure DEST_PATH_IMAGE015
Wherein, the value range of the exponent number stored in float type of each element in the exp data to be calculated is [0,255%]. Since a hardware register may provide 64 bits to store the value of the two-norm regularizer L2, the range of representation of L2 is
Figure 427773DEST_PATH_IMAGE016
. To control the range of L2, a base parameter is added in the actual process to change the range of values indicated by L2 by changing the value of base. Thus, L2 is ultimately optimized to
Figure 664720DEST_PATH_IMAGE001
. Wherein different data types may correspond to different base values.
Exemplarily, assuming base =127, the representation range of the two-norm regularization term L2
Figure 734307DEST_PATH_IMAGE016
(ii) a Assuming base =95, the range of representation of the two-norm regularization term L2
Figure DEST_PATH_IMAGE017
(ii) a Assuming base =159, the range of representation of the two-norm regularization term L2
Figure 40392DEST_PATH_IMAGE018
And step 120, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function.
In this embodiment, after the target data to be calculated is obtained, it can be known from the optimized two-norm regularization function that the calculation of each element is required
Figure 853627DEST_PATH_IMAGE007
Corresponding index value, i.e.
Figure 86026DEST_PATH_IMAGE002
And then the two-norm regularization value of each element can be calculated.
Optionally, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function may include: calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element; and respectively calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware according to the square exponent calculation formula and the exponent of each element.
In this embodiment, the index calculation rules corresponding to the various data types may be designed in advance, and after the calculation module obtains the target data to be calculated, the index calculation rules may be matched with the data types of the elements of the target data according to the data typesAnd the exponent calculation rule calculates the exponent exp of each element through a hardware shift circuit and/or a decoding circuit in the calculation module. Then substituting the element index exp into the formula
Figure 451148DEST_PATH_IMAGE002
And calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware.
The index calculation rules corresponding to different data types are as follows:
if the data type is FP32, the data of the upper 8 bits except the sign bit of the highest bit is its exponent exp.
If the data type is BF16, the data of the upper 8 bits except the sign bit of the highest bit is its exponent exp.
If the data type is FP16, the data with 5 upper bits except the sign bit of the highest bit is its exponent exp.
If the data type is UINT8/16/32, the number of bits of the selected bit in which the most significant 1 in the data is located is decoded preferentially by a decoding circuit, which is the exponent exp thereof. For example, for element 32' hf000 — 0000, the index is 31.
If the data type is INT8/16/32, the complement of the data is calculated by a shift circuit, and then the number of bits of the most significant bit, excluding the sign bit of the most significant bit, in the selected data, which is the exponent exp thereof, is preferentially decoded by a decoding circuit. For example, for element 16' b 11111100000000000000, the index is 14.
Step 130, if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value.
In this embodiment, the square index value is obtained to obtain a two-norm regularization value of the element, and the two-norm regularization value of the element has an effective value range, and whether the two-norm regularization value of each element is effective is mainly determined by whether the square index value of the element is effective, so after the square index value of the element is calculated, validity detection needs to be performed on each square index value. And if the target square index value which does not meet the first detection condition exists in the square index values of the elements, correcting the target square index value to be within an effective value range so as to avoid the storage of the square index values which exceed the effective range and occupy more memory, and reduce the influence on the hardware performance. And simultaneously, generating a state indicating signal corresponding to the target square index value, wherein the state indicating signal is used for indicating that the two-norm regularization value is modified, and recording abnormal conditions corresponding to the two-norm regularization value before modification.
And step 140, calculating a two-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the two-norm regular term when the two-norm regular term does not accord with a second detection condition.
In this embodiment, the squared index value of each element may be substituted into the optimized two-norm regularization function
Figure 136207DEST_PATH_IMAGE001
That is, the cumulative sum of the two-norm regularization values of the respective elements is calculated as the two-norm regularization term of the target data. Because the two-norm regular term has an effective value range, whether the two-norm regular term is in the effective value range can be detected after the two-norm regular term is obtained. And if the two-norm regular term exceeds the effective value range, correcting the two-norm regular term into the effective value range so as to avoid the two-norm regular term exceeding the effective range from being stored to occupy more memory and reduce the influence on the hardware performance. And simultaneously, generating a state indicating signal corresponding to the two-norm regular term, wherein the state indicating signal is used for indicating that the two-norm regular term is modified, and recording abnormal conditions corresponding to the two-norm regular term before modification.
And 150, sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.
In this embodiment, the corrected two-norm regular term and each state indication signal generated in the calculation process may be sent to a result analysis module in the chip, so that upper software may analyze the calculation result, find an abnormal condition in the calculation process, and send an alarm.
It should be noted that, because the DMA module may only transfer a part of data in one tensor at a time, after the result analysis module obtains the normalized terms of two norms of each target data, the upper layer software needs to calculate the normalized terms of two norms of each tensor after calculation in a summary manner according to the correspondence table between each target data and the tensor, and further perform anomaly analysis on the normalized terms of two norms of each tensor.
The technical scheme of the embodiment of the invention is applied to a computing module in a chip, and target data to be computed is obtained by reading a data writing bus of a DMA (direct memory access) module when the DMA module of the chip is detected to access memory data; calculating the square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a corresponding state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to the corrected square index value of each element, correcting the second-norm regular term which does not meet a second detection condition, and generating a corresponding state indication signal; the two-norm regular term and the state indication signals are sent to the result analysis module in the chip for analysis, the problem that the network performance is reduced due to the fact that the calculation node is added in a program to calculate the two-norm regular term in the prior art is solved, the two-norm regular term of the memory data is automatically calculated by using hardware resources of the chip in the process that a DMA (direct memory access) module of the chip accesses the memory data, and the performance influence on a network model is reduced.
Example two
Fig. 2 is a flowchart of a method for calculating a two-norm regularization term in the second embodiment of the present invention, which may be combined with various alternatives in the foregoing embodiments. Specifically, referring to fig. 2, the method may include the steps of:
step 210, responding to the parameter configuration operation of the register in the DMA module to the calculation module, and acquiring the configuration parameter of the calculation module.
The register performs parameter configuration operation on the calculation module according to the memory access command packet received by the DMA module; the configuration parameters include: enabling the calculation of the two-norm regular term, sending the two-norm regular term to a result analysis module to enable, the data type of the data to be calculated, the basic value of the sliding window and the overflow threshold.
In this embodiment, in order to configure the calculation parameters separately for each accessed data, before the DMA module carries data each time, whether the hardware calculation function of the calculation module is turned on may be configured by the register. After the DMA module receives the memory access command packet, the registers in the DMA module are configured according to the command packet, and then the relevant parameters of the calculation module are configured through the registers.
In this embodiment, each time the DMA module accesses the memory data, the DMA module is initiated by one command packet, and the command packet includes specific information of the memory data that the DMA module accesses at this time, such as specific operation to be performed on the data, the size of the access data, and the address of the access data. In order to enhance the independence and flexibility of hardware configuration, the relevant parameters of the calculation module can be independently configured for data access initiated by any command packet of the DMA module. For example, it may be configured independently whether to turn on the calculation function of the two-norm regular term and send the calculation result to the result analysis module in the data access process initiated by a certain command packet of the DMA module. In addition, the data type of the data to be calculated during the calculation, the base value base of the variance sliding window and the overflow threshold value V _ MAX can be flexibly configured.
And step 220, executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module.
Wherein, the related data of the two-norm regular term comprises: the target data comprises a two-norm regular term, accumulation times of two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term.
In this embodiment, after the calculation function of the calculation module is turned on, in order to avoid that the calculation result of the last time affects the correctness of the calculation, before the calculation of the current time is started, the two-norm regular term generated by the previous calculation, the effective accumulation times of the two-norm regularization values of each element in the process of calculating the two-norm regular term, the underflow signal or the overflow signal corresponding to the square index value of each element, and the overflow signal corresponding to the calculated two-norm regular term need to be reset and cleared.
Step 230, when it is detected that the DMA module in the chip accesses the memory data, reading the write data bus of the DMA module to obtain the target data to be calculated.
And step 240, calculating indexes and corresponding square index values of all elements in the target data through a hardware circuit in the calculation module.
And step 250, if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value.
Optionally, if a target square index value that does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value may include: aiming at the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working; if the square index value is smaller than the preset lower limit value of the effective range, generating an underflow signal corresponding to the square index value, and correcting the square index value into the lower limit value of the effective range; and if the square index value is larger than the preset upper limit value of the effective range, generating an overflow signal corresponding to the square index value, and correcting the square index value into the upper limit value of the effective range.
In this embodiment, whether the square index value of each element is valid may be determined by a hardware comparison circuit. For the square exponent value of each element, since the two-norm regularization term is 64-bit storage in hardware, the range of the square exponent value must be [0, 63 ]. Therefore, the valid range lower limit value may be set to 0, the valid range upper limit value may be set to 63, and the overflow threshold value may be used to flexibly control the valid range upper limit value, and thus, may be set to any value less than 64.
In this embodiment, the square index value is compared with the overflow threshold, and if the square index value is greater than the overflow threshold V _ MAX and V _ MAX is greater than 0, it indicates that the square index value of a single element has been stored in an overflow manner, and a hardware DMA error interrupt signal may be generated, so that the hardware DMA module immediately stops working, and remains on site, thereby facilitating a user to check the data at that time.
And comparing the square index value with the lower limit value 0 of the effective range, and if the square index value is less than 0, determining the two-norm regularization value of the element as a decimal, so that the accumulated two-norm regularization term can also be a decimal, and the 64-bit number stored in the hardware is integer data which is not beneficial to the storage of the hardware. Therefore, when the square index value is less than 0, the square index value is automatically set to 0, the fractions are made equal to 1, and a corresponding underflow signal is generated.
And comparing the square index value with a valid range upper limit value 63, if the square index value is greater than 63, the two-norm regularization value of the element is greater than 2^63, and the storage of hardware is already burst without accumulating the two-norm regularization values of other elements, so that when the square index value is greater than or equal to 64, an overflow signal of the square index value is generated, and the square index value is automatically modified into 63.
And step 260, calculating a two-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the two-norm regular term when the two-norm regular term does not accord with a second detection condition.
Optionally, calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not meet a second detection condition, which may include: calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of target data; and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term into the two-norm upper limit value minus 1.
In this embodiment, the two norm regularization function after optimization is used
Figure 753133DEST_PATH_IMAGE001
The two-norm regularization numerical value can be calculated through a shift circuit of hardware according to the square index value of the element, and then the two-norm regularization numerical values of all the elements are accumulated through an adder of the hardware so as to obtain a two-norm regularization term of the target data.
In this embodiment, since the two-norm regular term is 64-bit storage in hardware, the maximum value of the two-norm regular term is 64 bits and all the bits are 1, that is, the maximum value is 1
Figure DEST_PATH_IMAGE019
. Therefore, when the accumulated value is greater than or equal to the upper limit of the two-norm
Figure 184246DEST_PATH_IMAGE020
When the accumulated value is too large, an overflow signal is generated to indicate overflow and the accumulated value is automatically set to be excessive
Figure 720269DEST_PATH_IMAGE019
Optionally, in the process of calculating and accumulating the two-norm regularization values corresponding to the square index values of the elements to obtain the two-norm regularization term of the target data, the method may further include: counting the accumulation times of the two-norm regularization values of each element through a hardware counter; and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.
In this embodiment, if the two-norm regular term is prediction data output by the machine learning model in the training process, the upper layer software may compare the two-norm regular term with a corresponding reference value when analyzing the two-norm regular term, and if the difference between the two is large, the machine learning model needs to be parameter-adjusted to increase the calculation accuracy. Therefore, in order to roughly estimate the error magnitude between each predicted two-norm regularization value and the real two-norm regularization value, the number of times of accumulation of the two-norm regularization values of each element may be counted by a hardware counter.
The more the accumulation times, the more the effective range of the two-norm regularization term is exceeded after the accumulation times, namely the error between each predicted two-norm regularization value and the real two-norm regularization value is smaller, and the model parameters can be adjusted in a small adjustment range; otherwise, it indicates that the error between each predicted two-norm regularization value and the real two-norm regularization value is large, and the adjustment range of the model parameter needs to be increased.
It should be noted that, since the square exponent value of the underflow signal is modified to 0, the corresponding two-norm regularization value is changed to 0
Figure DEST_PATH_IMAGE021
The contribution to the accumulated value is small, and therefore, may not be counted.
And step 270, sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.
Optionally, the sending the modified two-norm regular term and each state indication signal to a result analysis module in the chip for analysis may include: and sending the two-norm regular term, the accumulation times of the two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.
In this embodiment, the two-norm regularization term of the target data, the effective accumulation times of the two-norm regularization values of each element in the process of calculating the two-norm regularization term, the underflow signal or the overflow signal corresponding to the square index value of each element, and the overflow signal corresponding to the calculated two-norm regularization term may be sent to a result analysis module in the chip, so that upper software may analyze the calculation result, find an abnormal condition in the calculation process, and send an alarm.
It should be noted that the target data at each time of calculation may be only a part of data in one tensor, and after the upper layer software calculates the two-norm regular term of each tensor in a summary manner according to the correspondence table between each target data and the tensor, there are mainly the following two methods of use: 1. and debugging software detects the two-norm regular term of each tensor, finds the abnormally increased part in each tensor, and sends an alarm. 2. And finding a correct model training as a reference, and storing the two-norm regular term of the corresponding tensor. And then comparing the binnorm regular term of the new tensor calculated in the training process with a reference value on the basis of fixing each iteration data, wherein the binorm regular term needs to be ensured within a certain error.
The technical scheme of the embodiment of the invention is applied to a computing module in a chip, and target data to be computed is obtained by reading a data writing bus of a DMA (direct memory access) module when the DMA module of the chip is detected to access memory data; calculating the square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a corresponding state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to the corrected square index value of each element, correcting the second-norm regular term which does not meet a second detection condition, and generating a corresponding state indication signal; the two-norm regular term and the state indication signals are sent to the result analysis module in the chip for analysis, the problem that the network performance is reduced due to the fact that the calculation node is added in a program to calculate the two-norm regular term in the prior art is solved, the two-norm regular term of the memory data is automatically calculated by using hardware resources of the chip in the process that a DMA (direct memory access) module of the chip accesses the memory data, and the performance influence on a network model is reduced.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a computing chip for a two-norm regular term according to a third embodiment of the present invention, which is applicable to a case where a hardware resource is used to automatically compute a two-norm regular term of memory data. As shown in fig. 3, the chip includes: a DMA module 310, a calculation module 320, and a result analysis module 330;
the DMA module 310 is configured to configure and instruct a register in the DMA module to perform parameter configuration on the calculation module 320 according to the received memory access command packet, and access specified memory data according to the memory access command packet;
a calculating module 320, configured to execute a method for calculating a two-norm regularization term provided in any embodiment of the present invention;
the result analyzing module 330 is configured to receive the two-norm regular term and the state indicating signals sent by the calculating module 320, and perform anomaly analysis on the two-norm regular term and the state indicating signals.
Optionally, the method further includes:
a function optimization module for optimizing the two-norm regularization function to a two-norm regularization function before detecting that the DMA module in the chip accesses the memory data
Figure 627045DEST_PATH_IMAGE001
Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,
Figure 913843DEST_PATH_IMAGE002
l2 is a two-norm regular term for the data to be calculated, which is a square exponential calculation formula for each element.
Optionally, the calculating module 320 includes:
the square index calculation unit is used for calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element;
and respectively calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware according to the square exponent calculation formula and the exponent of each element.
Optionally, the calculating module 320 includes:
the square index detection unit is used for generating a DMA error interrupt signal to indicate the DMA module to stop working if the square index value of each element is greater than the overflow threshold value and the overflow threshold value is a positive number;
if the square index value is smaller than the preset lower limit value of the effective range, generating an underflow signal corresponding to the square index value, and correcting the square index value into the lower limit value of the effective range;
and if the square index value is larger than the preset upper limit value of the effective range, generating an overflow signal corresponding to the square index value, and correcting the square index value into the upper limit value of the effective range.
Optionally, the calculating module 320 includes:
the two-norm regular term calculation unit is used for calculating and accumulating two-norm regular numerical values corresponding to the square index values of the elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain a two-norm regular term of the target data;
and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term into the two-norm upper limit value minus 1.
Optionally, the method further includes: the counting module is used for counting the accumulation times of the two-norm regularization values of the elements through a hardware counter in the process of calculating the two-norm regularization values corresponding to the square index values of the elements and accumulating the two-norm regularization values to obtain two-norm regularization terms of the target data;
and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.
Optionally, the calculating module 320 includes:
and the sending unit is used for sending the two-norm regular term of the target data, the accumulation times of the two-norm regular numerical value, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term to a result analysis module in the chip for analysis.
Optionally, the method further includes:
the parameter configuration module is used for responding to the parameter configuration operation of a register in the DMA module to the calculation module before detecting that the DMA module in the chip accesses the memory data, and acquiring the configuration parameters of the calculation module;
the register performs parameter configuration operation on the calculation module according to the memory access command packet received by the DMA module; the configuration parameters include: enabling the calculation of the two-norm regular term, sending the two-norm regular term to a result analysis module to enable, the data type of the data to be calculated, the basic value of the sliding window and the overflow threshold.
Optionally, the method further includes:
the reset module is used for executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module after the configuration parameters of the calculation module are acquired in response to the parameter configuration operation of the register in the DMA module on the calculation module;
the relevant data of the two-norm regular term includes: the target data comprises a two-norm regular term, accumulation times of two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term.
The technical scheme of the embodiment of the invention is applied to a computing module in a chip, and target data to be computed is obtained by reading a data writing bus of a DMA (direct memory access) module when the DMA module of the chip is detected to access memory data; calculating the square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a corresponding state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to the corrected square index value of each element, correcting the second-norm regular term which does not meet a second detection condition, and generating a corresponding state indication signal; the two-norm regular term and the state indication signals are sent to the result analysis module in the chip for analysis, the problem that the network performance is reduced due to the fact that the calculation node is added in a program to calculate the two-norm regular term in the prior art is solved, the two-norm regular term of the memory data is automatically calculated by using hardware resources of the chip in the process that a DMA (direct memory access) module of the chip accesses the memory data, and the performance influence on a network model is reduced.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A calculation method of a two-norm regular term is characterized in that a calculation module applied to a chip comprises the following steps:
when detecting that a Direct Memory Access (DMA) module in the chip accesses memory data, reading a write data bus of the DMA module to obtain target data to be calculated;
calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function;
if a target square index value which does not meet a first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value;
calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not accord with a second detection condition;
and sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.
2. The method of claim 1, prior to detecting that a Direct Memory Access (DMA) module in the chip accesses memory data, further comprising:
optimizing a two-norm regularization function to
Figure DEST_PATH_IMAGE001
Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,
Figure 171002DEST_PATH_IMAGE002
l2 is a two-norm regular term for the data to be calculated, which is a square exponential calculation formula for each element.
3. The method of claim 2, wherein calculating a square exponent value of each element in the target data according to a square exponent calculation formula corresponding to a two-norm regularization function comprises:
calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element;
and respectively calculating the square index value of each element through a shift circuit, an adder and a subtracter of hardware according to the square index calculation formula and the index of each element.
4. The method of claim 1, wherein if a target square index value that does not meet a first detection condition exists in the square index values of the elements, generating a status indication signal corresponding to the target square index value and correcting the target square index value comprises:
for the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working;
if the square index value is smaller than a preset effective range lower limit value, generating an underflow signal corresponding to the square index value, and correcting the square index value into the effective range lower limit value;
and if the square index value is larger than a preset effective range upper limit value, generating an overflow signal corresponding to the square index value, and correcting the square index value into the effective range upper limit value.
5. The method according to claim 2, wherein calculating a two-norm regularization term of the target data according to the modified square index values of the elements, and generating a corresponding state indication signal and modifying the two-norm regularization term when the two-norm regularization term does not meet a second detection condition includes:
calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of the target data;
and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term to the two-norm upper limit value minus 1.
6. The method of claim 5, wherein in calculating and accumulating the two-norm regularization values corresponding to the square index values of the respective elements to obtain the two-norm regularization term of the target data, further comprising:
counting the accumulation times of the two-norm regularization values of each element through a hardware counter;
and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.
7. The method of claim 1, wherein sending the modified two-norm regularization term and the status indication signals to a result analysis module in the chip for analysis comprises:
and sending the two-norm regular term of the target data, the accumulation times of the two-norm regular numerical value, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.
8. The method of claim 1, prior to detecting that a DMA module in the chip accesses memory data, further comprising:
responding to the parameter configuration operation of a register in a DMA module to the calculation module, and acquiring the configuration parameters of the calculation module;
the register performs parameter configuration operation on the calculation module according to the memory access command packet received by the DMA module; the configuration parameters include: enabling the calculation of the two-norm regular term, sending the two-norm regular term to a result analysis module to enable, the data type of the data to be calculated, the basic value of the sliding window and the overflow threshold.
9. The method of claim 8, further comprising, after obtaining configuration parameters for the computing module in response to a parameter configuration operation of a register in a DMA module on the computing module, the method further comprising:
executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module;
the related data of the two-norm regular term comprises: the method comprises the steps of obtaining a two-norm regularization term of target data, accumulating times of two-norm regularization values, underflow signals corresponding to square index values, overflow signals corresponding to square index values and overflow signals corresponding to the two-norm regularization term.
10. A two-norm regularization term computational chip, the chip comprising: the DMA module, the calculation module and the result analysis module;
the DMA module is used for configuring and indicating a register in the DMA module to perform parameter configuration on the computing module according to the received memory access command packet, and accessing specified memory data according to the memory access command packet;
the calculation module is used for executing the calculation method of the two-norm regular term in any one of claims 1-9;
and the result analysis module is used for receiving the two-norm regular term and each state indication signal sent by the calculation module and carrying out abnormal analysis on the two-norm regular term and each state indication signal.
CN202110519691.6A 2021-05-13 2021-05-13 Calculation method and chip of two-norm regular term Active CN112947892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519691.6A CN112947892B (en) 2021-05-13 2021-05-13 Calculation method and chip of two-norm regular term

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519691.6A CN112947892B (en) 2021-05-13 2021-05-13 Calculation method and chip of two-norm regular term

Publications (2)

Publication Number Publication Date
CN112947892A true CN112947892A (en) 2021-06-11
CN112947892B CN112947892B (en) 2021-08-27

Family

ID=76233770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519691.6A Active CN112947892B (en) 2021-05-13 2021-05-13 Calculation method and chip of two-norm regular term

Country Status (1)

Country Link
CN (1) CN112947892B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system
CN105659905B (en) * 2011-12-08 2014-08-13 北京空间飞行器总体设计部 A kind of adaptive approach of Modifying model ill-posed problem regularization parameter
US20200089535A1 (en) * 2018-05-16 2020-03-19 Shanghai Cambricon Information Technology Co., Ltd Data sharing system and data sharing method therefor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676864A (en) * 2008-09-16 2010-03-24 国际商业机器公司 Method and device for acquiring Euclidean norm of vector in processing system
CN105659905B (en) * 2011-12-08 2014-08-13 北京空间飞行器总体设计部 A kind of adaptive approach of Modifying model ill-posed problem regularization parameter
US20200089535A1 (en) * 2018-05-16 2020-03-19 Shanghai Cambricon Information Technology Co., Ltd Data sharing system and data sharing method therefor

Also Published As

Publication number Publication date
CN112947892B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
US8307259B2 (en) Hardware based memory scrubbing
TWI618032B (en) Object detection and tracking method and system
CN110381310B (en) Method and device for detecting health state of visual system
WO2017020614A1 (en) Disk detection method and device
CN111027412B (en) Human body key point identification method and device and electronic equipment
US8301992B2 (en) System and apparatus for error-correcting register files
US11093245B2 (en) Computer system and memory access technology
KR20180084057A (en) Apparatus for Calculating and Retaining a Bound on Error During Floating Point Operations and Methods Thereof
CN111126268B (en) Key point detection model training method and device, electronic equipment and storage medium
JP5014920B2 (en) Circuit design method and integrated circuit manufactured by the method
CN110619137A (en) Time sequence analysis method aiming at voltage drop and application
CN117240859A (en) Automatic adjustment method, device, equipment and storage medium for equalization parameters of transmitting end
JP2021108230A (en) Neural network processor and method for processing neural network
US20060066317A1 (en) Method and device for determining the time response of a digital circuit
CN112947892B (en) Calculation method and chip of two-norm regular term
CN117521567B (en) Mixed logic comprehensive optimization method and device of circuit and electronic equipment
GB2488665A (en) Detecting a valid square root, multiplicative inverse or division of floating point numbers by checking if the error is less than a predetermined margin
CN111552652B (en) Data processing method and device based on artificial intelligence chip and storage medium
US8514999B2 (en) Floating-point event counters with automatic prescaling
US11989560B2 (en) Method and device for executing instructions to perform artificial intelligence
CN114281691A (en) Test case sequencing method and device, computing equipment and storage medium
CN113238974A (en) Bus bandwidth efficiency statistical method, device, equipment and medium
KR100525537B1 (en) Applied Program Bungle Detection Apparatus and Method by Interrupt
WO2024127523A1 (en) Processing load estimation system and processing load estimation method
US8732535B2 (en) Method of detection of erroneous memory usage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant