CN112947892A

CN112947892A - Calculation method and chip of two-norm regular term

Info

Publication number: CN112947892A
Application number: CN202110519691.6A
Authority: CN
Inventors: 侯东伯; 朱剑丘; 沈大框
Original assignee: Beijing Suiyuan Intelligent Technology Co ltd
Current assignee: Beijing Suiyuan Intelligent Technology Co ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-06-11
Anticipated expiration: 2041-05-13
Also published as: CN112947892B

Abstract

The embodiment of the invention discloses a method and a chip for calculating a two-norm regular term. The method is applied to a computing module in a chip and comprises the following steps: when detecting that a DMA module of a chip accesses memory data, reading a write data bus of the DMA module to obtain target data; calculating a square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to each square index value, correcting the second-norm regular term which does not meet a second detection condition, and generating a state indication signal; and sending the two-norm regular term and each state indication signal to a result analysis module in the chip for analysis. According to the technical scheme of the embodiment of the invention, the two-norm regular term of the memory data is automatically calculated by using the hardware resource of the chip in the process of accessing the memory data by the DMA module of the chip, and the influence on the performance of a network model is reduced.

Description

Calculation method and chip of two-norm regular term

Technical Field

The embodiment of the invention relates to the technical field of computer chips, in particular to a method and a chip for calculating a two-norm regular term.

Background

At present, a deep learning network model often needs to use a two-norm regularization function to calculate tensors of calculation nodes, weights and gradients in the model into constant values to judge the correctness of calculation results. In the prior art, a new computing node is generally added in a script execution program for building a network to compute a two-norm regular term of data, but the method has the following defects: (1) because a large number of additional computing operations are added, the performance of the whole computing network is seriously reduced. (2) The topology of the native model computation graph is changed, which may cause the problem needing debugging to be unable to be reproduced. (3) And additional storage space is added, and some super-large models cannot be debugged under a critical state.

Disclosure of Invention

The embodiment of the invention provides a method and a chip for calculating a two-norm regular term, which are used for automatically calculating the two-norm regular term of Memory data by using hardware resources of the chip in the process of accessing the Memory data by a Direct Memory Access (DMA) module of the chip, and reducing the performance influence on a network model.

In a first aspect, an embodiment of the present invention provides a method for calculating a two-norm regularization term, which is applied to a calculation module in a chip, and includes:

when detecting that a Direct Memory Access (DMA) module in a chip accesses memory data, reading a data writing bus of the DMA module to obtain target data to be calculated;

calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function;

if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value;

calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not accord with a second detection condition;

and sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.

Optionally, before detecting that the DMA module in the chip accesses the memory data, the method further includes:

optimizing a two-norm regularization function to

；

Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,

l2 is a two-norm regular term for the data to be calculated, which is a square exponential calculation formula for each element.

Optionally, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function, including:

calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element;

and respectively calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware according to the square exponent calculation formula and the exponent of each element.

Optionally, if a target square index value that does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value, including:

aiming at the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working;

if the square index value is smaller than the preset lower limit value of the effective range, generating an underflow signal corresponding to the square index value, and correcting the square index value into the lower limit value of the effective range;

and if the square index value is larger than the preset upper limit value of the effective range, generating an overflow signal corresponding to the square index value, and correcting the square index value into the upper limit value of the effective range.

Optionally, the calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not meet a second detection condition, includes:

calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of target data;

and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term into the two-norm upper limit value minus 1.

Optionally, in the process of calculating and accumulating the two-norm regularization values corresponding to the square index values of the elements to obtain the two-norm regularization term of the target data, the method further includes:

counting the accumulation times of the two-norm regularization values of each element through a hardware counter;

and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.

Optionally, the corrected two-norm regular term and each state indication signal are sent to a result analysis module in the chip for analysis, where the analysis includes:

and sending the two-norm regular term, the accumulation times of the two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.

responding to the parameter configuration operation of a register in the DMA module to the calculation module, and acquiring the configuration parameters of the calculation module;

the register performs parameter configuration operation on the calculation module according to the memory access command packet received by the DMA module; the configuration parameters include: enabling the calculation of the two-norm regular term, sending the two-norm regular term to a result analysis module to enable, the data type of the data to be calculated, the basic value of the sliding window and the overflow threshold.

Optionally, after the obtaining the configuration parameters of the computing module in response to the parameter configuration operation of the computing module by the register in the DMA module, the method further includes:

executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module;

the relevant data of the two-norm regular term includes: the target data comprises a two-norm regular term, accumulation times of two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term.

In a second aspect, an embodiment of the present invention further provides a computing chip with a two-norm regular term, where the chip includes: the DMA module, the calculation module and the result analysis module;

the DMA module is used for configuring and indicating a register in the DMA module to perform parameter configuration on the calculation module according to the received memory access command packet and accessing specified memory data according to the memory access command packet;

a calculation module for performing a calculation method of the two-norm regularization term of any one of claims 1 to 9;

and the result analysis module is used for receiving the two-norm regular term and each state indication signal sent by the calculation module and carrying out abnormal analysis on the two-norm regular term and each state indication signal.

The technical scheme of the embodiment of the invention is applied to a computing module in a chip, and target data to be computed is obtained by reading a data writing bus of a DMA (direct memory access) module when the DMA module of the chip is detected to access memory data; calculating the square index value of each element in the target data and detecting whether the square index value meets a first detection condition, if not, generating a corresponding state indication signal and correcting the square index value; calculating a second-norm regular term of the target data according to the corrected square index value of each element, correcting the second-norm regular term which does not meet a second detection condition, and generating a corresponding state indication signal; the two-norm regular term and the state indication signals are sent to the result analysis module in the chip for analysis, the problem that the network performance is reduced due to the fact that the calculation node is added in a program to calculate the two-norm regular term in the prior art is solved, the two-norm regular term of the memory data is automatically calculated by using hardware resources of the chip in the process that a DMA (direct memory access) module of the chip accesses the memory data, and the performance influence on a network model is reduced.

Drawings

FIG. 1a is a flowchart of a method for calculating a two-norm regularization term according to a first embodiment of the present invention;

FIG. 1b is a diagram illustrating the structure of a floating-point FP32 according to the IEEE 754 standard in the first embodiment of the present invention;

FIG. 2 is a flowchart of a method for calculating a two-norm regularization term according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a computing chip with a two-norm regularization term according to a third embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a flowchart of a method for calculating a two-norm regularization term in an embodiment of the present invention, which is applicable to a case where hardware resources are used to automatically calculate a two-norm regularization term of memory data, and the method can be executed by a calculation module in a chip. As shown in fig. 1a, the method is applied to a computing module in a chip, and includes:

and step 110, when detecting that the DMA module in the chip accesses the memory data, reading a write data bus of the DMA module to obtain target data to be calculated.

The DMA module in the chip is used for copying data from one address space to another address space and providing high-speed data transmission between the peripheral and the memory or between the memory and the memory. In order to calculate the two-norm regular term of the memory data by using hardware resources in the process of carrying the memory data by using a DMA (direct memory access) module of a chip, a calculation module is added in the chip, so that the two-norm regular term of the memory data can be calculated by using the calculation module without participating in operation by using other modules in the chip, thereby not affecting data transmission between the DMA module and the memory and having no influence on the performance of the chip.

In this embodiment, when the computing module detects that the DMA module transfers data between the memories, for example, when the DMA module writes the output data of the deep learning model into the memory for storage, the computing module may read the write data bus of the DMA module, and use an effective element of the write data bus as input data of the computing module, that is, target data to be computed, in a downsampling manner.

The purpose of down-sampling the write data bus is to reduce the amount of computation, among other things. For the convenience of hardware operation, the least significant element of the write data bus can be used as input data of the computation module by downsampling.

Optionally, before detecting that the DMA module in the chip accesses the memory data, the method may further include: optimizing a two-norm regularization function to

(ii) a Where exp is the exponent number stored in float type of each element in the data to be calculated, base is the sliding window base value,

In this embodiment, in order to reduce hardware for calculating the two-norm regularization termImplementation overhead, regularization function to the traditional two-norm

Performing derivation analysis to obtain an optimized two-norm regularization function

。

The function optimization derivation process is as follows:

regularization function for conventional two-norm

，

For any vector element in the input vector V, since 0.5 is a constant coefficient, it is possible to make

. Suppose that

Is FP32 floating point number, find

I.e. to find a certain vector element

. Taking the representation method of FP32 floating point under IEEE 754 standard shown in fig. 1b as an example,

can be expressed as

，

The exponent part e of (2) is 8 bits in total, and when e is a positive integer, the numerical range is

Numerical range of the index

. In calculating the sum of squares of all elements

In order to adapt to different data types, only the exponent part of an element is squared, while the mantissa part of the element is discarded, and at the same time, because of the squaring, the sign bit of the element is always 0. Therefore, the temperature of the molten metal is controlled,

wherein

. Therefore, the temperature of the molten metal is controlled,

。

order to

Then L2_ Norm is optimized to

Wherein, the value range of the exponent number stored in float type of each element in the exp data to be calculated is [0,255%]. Since a hardware register may provide 64 bits to store the value of the two-norm regularizer L2, the range of representation of L2 is

. To control the range of L2, a base parameter is added in the actual process to change the range of values indicated by L2 by changing the value of base. Thus, L2 is ultimately optimized to

. Wherein different data types may correspond to different base values.

Exemplarily, assuming base =127, the representation range of the two-norm regularization term L2

(ii) a Assuming base =95, the range of representation of the two-norm regularization term L2

(ii) a Assuming base =159, the range of representation of the two-norm regularization term L2

。

And step 120, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function.

In this embodiment, after the target data to be calculated is obtained, it can be known from the optimized two-norm regularization function that the calculation of each element is required

Corresponding index value, i.e.

And then the two-norm regularization value of each element can be calculated.

Optionally, calculating a square index value of each element in the target data according to a square index calculation formula corresponding to the two-norm regularization function may include: calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element; and respectively calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware according to the square exponent calculation formula and the exponent of each element.

In this embodiment, the index calculation rules corresponding to the various data types may be designed in advance, and after the calculation module obtains the target data to be calculated, the index calculation rules may be matched with the data types of the elements of the target data according to the data typesAnd the exponent calculation rule calculates the exponent exp of each element through a hardware shift circuit and/or a decoding circuit in the calculation module. Then substituting the element index exp into the formula

And calculating the square exponent value of each element through a shift circuit, an adder and a subtracter of hardware.

The index calculation rules corresponding to different data types are as follows:

if the data type is FP32, the data of the upper 8 bits except the sign bit of the highest bit is its exponent exp.

If the data type is BF16, the data of the upper 8 bits except the sign bit of the highest bit is its exponent exp.

If the data type is FP16, the data with 5 upper bits except the sign bit of the highest bit is its exponent exp.

If the data type is UINT8/16/32, the number of bits of the selected bit in which the most significant 1 in the data is located is decoded preferentially by a decoding circuit, which is the exponent exp thereof. For example, for element 32' hf000 — 0000, the index is 31.

If the data type is INT8/16/32, the complement of the data is calculated by a shift circuit, and then the number of bits of the most significant bit, excluding the sign bit of the most significant bit, in the selected data, which is the exponent exp thereof, is preferentially decoded by a decoding circuit. For example, for element 16' b 11111100000000000000, the index is 14.

Step 130, if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value.

In this embodiment, the square index value is obtained to obtain a two-norm regularization value of the element, and the two-norm regularization value of the element has an effective value range, and whether the two-norm regularization value of each element is effective is mainly determined by whether the square index value of the element is effective, so after the square index value of the element is calculated, validity detection needs to be performed on each square index value. And if the target square index value which does not meet the first detection condition exists in the square index values of the elements, correcting the target square index value to be within an effective value range so as to avoid the storage of the square index values which exceed the effective range and occupy more memory, and reduce the influence on the hardware performance. And simultaneously, generating a state indicating signal corresponding to the target square index value, wherein the state indicating signal is used for indicating that the two-norm regularization value is modified, and recording abnormal conditions corresponding to the two-norm regularization value before modification.

And step 140, calculating a two-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the two-norm regular term when the two-norm regular term does not accord with a second detection condition.

In this embodiment, the squared index value of each element may be substituted into the optimized two-norm regularization function

That is, the cumulative sum of the two-norm regularization values of the respective elements is calculated as the two-norm regularization term of the target data. Because the two-norm regular term has an effective value range, whether the two-norm regular term is in the effective value range can be detected after the two-norm regular term is obtained. And if the two-norm regular term exceeds the effective value range, correcting the two-norm regular term into the effective value range so as to avoid the two-norm regular term exceeding the effective range from being stored to occupy more memory and reduce the influence on the hardware performance. And simultaneously, generating a state indicating signal corresponding to the two-norm regular term, wherein the state indicating signal is used for indicating that the two-norm regular term is modified, and recording abnormal conditions corresponding to the two-norm regular term before modification.

And 150, sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.

In this embodiment, the corrected two-norm regular term and each state indication signal generated in the calculation process may be sent to a result analysis module in the chip, so that upper software may analyze the calculation result, find an abnormal condition in the calculation process, and send an alarm.

It should be noted that, because the DMA module may only transfer a part of data in one tensor at a time, after the result analysis module obtains the normalized terms of two norms of each target data, the upper layer software needs to calculate the normalized terms of two norms of each tensor after calculation in a summary manner according to the correspondence table between each target data and the tensor, and further perform anomaly analysis on the normalized terms of two norms of each tensor.

Example two

Fig. 2 is a flowchart of a method for calculating a two-norm regularization term in the second embodiment of the present invention, which may be combined with various alternatives in the foregoing embodiments. Specifically, referring to fig. 2, the method may include the steps of:

step 210, responding to the parameter configuration operation of the register in the DMA module to the calculation module, and acquiring the configuration parameter of the calculation module.

In this embodiment, in order to configure the calculation parameters separately for each accessed data, before the DMA module carries data each time, whether the hardware calculation function of the calculation module is turned on may be configured by the register. After the DMA module receives the memory access command packet, the registers in the DMA module are configured according to the command packet, and then the relevant parameters of the calculation module are configured through the registers.

In this embodiment, each time the DMA module accesses the memory data, the DMA module is initiated by one command packet, and the command packet includes specific information of the memory data that the DMA module accesses at this time, such as specific operation to be performed on the data, the size of the access data, and the address of the access data. In order to enhance the independence and flexibility of hardware configuration, the relevant parameters of the calculation module can be independently configured for data access initiated by any command packet of the DMA module. For example, it may be configured independently whether to turn on the calculation function of the two-norm regular term and send the calculation result to the result analysis module in the data access process initiated by a certain command packet of the DMA module. In addition, the data type of the data to be calculated during the calculation, the base value base of the variance sliding window and the overflow threshold value V _ MAX can be flexibly configured.

And step 220, executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module.

Wherein, the related data of the two-norm regular term comprises: the target data comprises a two-norm regular term, accumulation times of two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term.

In this embodiment, after the calculation function of the calculation module is turned on, in order to avoid that the calculation result of the last time affects the correctness of the calculation, before the calculation of the current time is started, the two-norm regular term generated by the previous calculation, the effective accumulation times of the two-norm regularization values of each element in the process of calculating the two-norm regular term, the underflow signal or the overflow signal corresponding to the square index value of each element, and the overflow signal corresponding to the calculated two-norm regular term need to be reset and cleared.

Step 230, when it is detected that the DMA module in the chip accesses the memory data, reading the write data bus of the DMA module to obtain the target data to be calculated.

And step 240, calculating indexes and corresponding square index values of all elements in the target data through a hardware circuit in the calculation module.

And step 250, if a target square index value which does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value.

Optionally, if a target square index value that does not meet the first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value may include: aiming at the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working; if the square index value is smaller than the preset lower limit value of the effective range, generating an underflow signal corresponding to the square index value, and correcting the square index value into the lower limit value of the effective range; and if the square index value is larger than the preset upper limit value of the effective range, generating an overflow signal corresponding to the square index value, and correcting the square index value into the upper limit value of the effective range.

In this embodiment, whether the square index value of each element is valid may be determined by a hardware comparison circuit. For the square exponent value of each element, since the two-norm regularization term is 64-bit storage in hardware, the range of the square exponent value must be [0, 63 ]. Therefore, the valid range lower limit value may be set to 0, the valid range upper limit value may be set to 63, and the overflow threshold value may be used to flexibly control the valid range upper limit value, and thus, may be set to any value less than 64.

In this embodiment, the square index value is compared with the overflow threshold, and if the square index value is greater than the overflow threshold V _ MAX and V _ MAX is greater than 0, it indicates that the square index value of a single element has been stored in an overflow manner, and a hardware DMA error interrupt signal may be generated, so that the hardware DMA module immediately stops working, and remains on site, thereby facilitating a user to check the data at that time.

And comparing the square index value with the lower limit value 0 of the effective range, and if the square index value is less than 0, determining the two-norm regularization value of the element as a decimal, so that the accumulated two-norm regularization term can also be a decimal, and the 64-bit number stored in the hardware is integer data which is not beneficial to the storage of the hardware. Therefore, when the square index value is less than 0, the square index value is automatically set to 0, the fractions are made equal to 1, and a corresponding underflow signal is generated.

And comparing the square index value with a valid range upper limit value 63, if the square index value is greater than 63, the two-norm regularization value of the element is greater than 2^63, and the storage of hardware is already burst without accumulating the two-norm regularization values of other elements, so that when the square index value is greater than or equal to 64, an overflow signal of the square index value is generated, and the square index value is automatically modified into 63.

And step 260, calculating a two-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the two-norm regular term when the two-norm regular term does not accord with a second detection condition.

Optionally, calculating a second-norm regular term of the target data according to the corrected square index value of each element, and generating a corresponding state indication signal and correcting the second-norm regular term when the second-norm regular term does not meet a second detection condition, which may include: calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of target data; and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term into the two-norm upper limit value minus 1.

In this embodiment, the two norm regularization function after optimization is used

The two-norm regularization numerical value can be calculated through a shift circuit of hardware according to the square index value of the element, and then the two-norm regularization numerical values of all the elements are accumulated through an adder of the hardware so as to obtain a two-norm regularization term of the target data.

In this embodiment, since the two-norm regular term is 64-bit storage in hardware, the maximum value of the two-norm regular term is 64 bits and all the bits are 1, that is, the maximum value is 1

. Therefore, when the accumulated value is greater than or equal to the upper limit of the two-norm

When the accumulated value is too large, an overflow signal is generated to indicate overflow and the accumulated value is automatically set to be excessive

。

Optionally, in the process of calculating and accumulating the two-norm regularization values corresponding to the square index values of the elements to obtain the two-norm regularization term of the target data, the method may further include: counting the accumulation times of the two-norm regularization values of each element through a hardware counter; and aiming at the square index value of the generated underflow signal, when the two-norm regularization numerical value corresponding to the square index value is accumulated, the hardware counter does not count.

In this embodiment, if the two-norm regular term is prediction data output by the machine learning model in the training process, the upper layer software may compare the two-norm regular term with a corresponding reference value when analyzing the two-norm regular term, and if the difference between the two is large, the machine learning model needs to be parameter-adjusted to increase the calculation accuracy. Therefore, in order to roughly estimate the error magnitude between each predicted two-norm regularization value and the real two-norm regularization value, the number of times of accumulation of the two-norm regularization values of each element may be counted by a hardware counter.

The more the accumulation times, the more the effective range of the two-norm regularization term is exceeded after the accumulation times, namely the error between each predicted two-norm regularization value and the real two-norm regularization value is smaller, and the model parameters can be adjusted in a small adjustment range; otherwise, it indicates that the error between each predicted two-norm regularization value and the real two-norm regularization value is large, and the adjustment range of the model parameter needs to be increased.

It should be noted that, since the square exponent value of the underflow signal is modified to 0, the corresponding two-norm regularization value is changed to 0

The contribution to the accumulated value is small, and therefore, may not be counted.

And step 270, sending the corrected two-norm regular term and each state indication signal to a result analysis module in the chip for analysis.

Optionally, the sending the modified two-norm regular term and each state indication signal to a result analysis module in the chip for analysis may include: and sending the two-norm regular term, the accumulation times of the two-norm regularization values, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.

In this embodiment, the two-norm regularization term of the target data, the effective accumulation times of the two-norm regularization values of each element in the process of calculating the two-norm regularization term, the underflow signal or the overflow signal corresponding to the square index value of each element, and the overflow signal corresponding to the calculated two-norm regularization term may be sent to a result analysis module in the chip, so that upper software may analyze the calculation result, find an abnormal condition in the calculation process, and send an alarm.

It should be noted that the target data at each time of calculation may be only a part of data in one tensor, and after the upper layer software calculates the two-norm regular term of each tensor in a summary manner according to the correspondence table between each target data and the tensor, there are mainly the following two methods of use: 1. and debugging software detects the two-norm regular term of each tensor, finds the abnormally increased part in each tensor, and sends an alarm. 2. And finding a correct model training as a reference, and storing the two-norm regular term of the corresponding tensor. And then comparing the binnorm regular term of the new tensor calculated in the training process with a reference value on the basis of fixing each iteration data, wherein the binorm regular term needs to be ensured within a certain error.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a computing chip for a two-norm regular term according to a third embodiment of the present invention, which is applicable to a case where a hardware resource is used to automatically compute a two-norm regular term of memory data. As shown in fig. 3, the chip includes: a DMA module 310, a calculation module 320, and a result analysis module 330;

the DMA module 310 is configured to configure and instruct a register in the DMA module to perform parameter configuration on the calculation module 320 according to the received memory access command packet, and access specified memory data according to the memory access command packet;

a calculating module 320, configured to execute a method for calculating a two-norm regularization term provided in any embodiment of the present invention;

the result analyzing module 330 is configured to receive the two-norm regular term and the state indicating signals sent by the calculating module 320, and perform anomaly analysis on the two-norm regular term and the state indicating signals.

Optionally, the method further includes:

a function optimization module for optimizing the two-norm regularization function to a two-norm regularization function before detecting that the DMA module in the chip accesses the memory data

；

Optionally, the calculating module 320 includes:

the square index calculation unit is used for calculating the index of each element in the target data through a shift circuit and/or a decoding circuit of hardware according to an index calculation rule matched with the data type of each element;

Optionally, the calculating module 320 includes:

the square index detection unit is used for generating a DMA error interrupt signal to indicate the DMA module to stop working if the square index value of each element is greater than the overflow threshold value and the overflow threshold value is a positive number;

Optionally, the calculating module 320 includes:

the two-norm regular term calculation unit is used for calculating and accumulating two-norm regular numerical values corresponding to the square index values of the elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain a two-norm regular term of the target data;

Optionally, the method further includes: the counting module is used for counting the accumulation times of the two-norm regularization values of the elements through a hardware counter in the process of calculating the two-norm regularization values corresponding to the square index values of the elements and accumulating the two-norm regularization values to obtain two-norm regularization terms of the target data;

Optionally, the calculating module 320 includes:

and the sending unit is used for sending the two-norm regular term of the target data, the accumulation times of the two-norm regular numerical value, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term to a result analysis module in the chip for analysis.

Optionally, the method further includes:

the parameter configuration module is used for responding to the parameter configuration operation of a register in the DMA module to the calculation module before detecting that the DMA module in the chip accesses the memory data, and acquiring the configuration parameters of the calculation module;

Optionally, the method further includes:

the reset module is used for executing reset zero clearing operation on the related data of the two-norm regular term obtained by the last calculation of the calculation module after the configuration parameters of the calculation module are acquired in response to the parameter configuration operation of the register in the DMA module on the calculation module;

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A calculation method of a two-norm regular term is characterized in that a calculation module applied to a chip comprises the following steps:

when detecting that a Direct Memory Access (DMA) module in the chip accesses memory data, reading a write data bus of the DMA module to obtain target data to be calculated;

if a target square index value which does not meet a first detection condition exists in the square index values of the elements, generating a state indication signal corresponding to the target square index value, and correcting the target square index value;

2. The method of claim 1, prior to detecting that a Direct Memory Access (DMA) module in the chip accesses memory data, further comprising:

optimizing a two-norm regularization function to

；

3. The method of claim 2, wherein calculating a square exponent value of each element in the target data according to a square exponent calculation formula corresponding to a two-norm regularization function comprises:

and respectively calculating the square index value of each element through a shift circuit, an adder and a subtracter of hardware according to the square index calculation formula and the index of each element.

4. The method of claim 1, wherein if a target square index value that does not meet a first detection condition exists in the square index values of the elements, generating a status indication signal corresponding to the target square index value and correcting the target square index value comprises:

for the square index value of each element, if the square index value is greater than an overflow threshold value and the overflow threshold value is a positive number, generating a DMA error interrupt signal to indicate a DMA module to stop working;

if the square index value is smaller than a preset effective range lower limit value, generating an underflow signal corresponding to the square index value, and correcting the square index value into the effective range lower limit value;

and if the square index value is larger than a preset effective range upper limit value, generating an overflow signal corresponding to the square index value, and correcting the square index value into the effective range upper limit value.

5. The method according to claim 2, wherein calculating a two-norm regularization term of the target data according to the modified square index values of the elements, and generating a corresponding state indication signal and modifying the two-norm regularization term when the two-norm regularization term does not meet a second detection condition includes:

calculating and accumulating two-norm regularized values corresponding to square index values of all elements through a shift circuit and an adder of hardware according to the optimized two-norm regularization function to obtain two-norm regularization terms of the target data;

and judging whether the two-norm regular term is greater than or equal to the two-norm upper limit value, if so, generating an overflow signal corresponding to the two-norm regular term, and correcting the two-norm regular term to the two-norm upper limit value minus 1.

6. The method of claim 5, wherein in calculating and accumulating the two-norm regularization values corresponding to the square index values of the respective elements to obtain the two-norm regularization term of the target data, further comprising:

7. The method of claim 1, wherein sending the modified two-norm regularization term and the status indication signals to a result analysis module in the chip for analysis comprises:

and sending the two-norm regular term of the target data, the accumulation times of the two-norm regular numerical value, underflow signals corresponding to the square index values, overflow signals corresponding to the square index values and overflow signals corresponding to the two-norm regular term of the target data to a result analysis module in the chip for analysis.

8. The method of claim 1, prior to detecting that a DMA module in the chip accesses memory data, further comprising:

responding to the parameter configuration operation of a register in a DMA module to the calculation module, and acquiring the configuration parameters of the calculation module;

9. The method of claim 8, further comprising, after obtaining configuration parameters for the computing module in response to a parameter configuration operation of a register in a DMA module on the computing module, the method further comprising:

the related data of the two-norm regular term comprises: the method comprises the steps of obtaining a two-norm regularization term of target data, accumulating times of two-norm regularization values, underflow signals corresponding to square index values, overflow signals corresponding to square index values and overflow signals corresponding to the two-norm regularization term.

10. A two-norm regularization term computational chip, the chip comprising: the DMA module, the calculation module and the result analysis module;

the DMA module is used for configuring and indicating a register in the DMA module to perform parameter configuration on the computing module according to the received memory access command packet, and accessing specified memory data according to the memory access command packet;

the calculation module is used for executing the calculation method of the two-norm regular term in any one of claims 1-9;