WO2022007880A1 - 数据精度配置方法和装置、神经网络设备、介质 - Google Patents

数据精度配置方法和装置、神经网络设备、介质 Download PDF

Info

Publication number
WO2022007880A1
WO2022007880A1 PCT/CN2021/105173 CN2021105173W WO2022007880A1 WO 2022007880 A1 WO2022007880 A1 WO 2022007880A1 CN 2021105173 W CN2021105173 W CN 2021105173W WO 2022007880 A1 WO2022007880 A1 WO 2022007880A1
Authority
WO
WIPO (PCT)
Prior art keywords
precision
layer
data
output
weight
Prior art date
Application number
PCT/CN2021/105173
Other languages
English (en)
French (fr)
Inventor
何伟
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022007880A1 publication Critical patent/WO2022007880A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present invention relate to the technical field of artificial intelligence, and in particular, to a method and apparatus for precision configuration of output data, a neural network device, and a computer-readable storage medium.
  • Neural networks based on computer systems include a large number of neurons that can handle problems such as image recognition, speech recognition, and natural language processing in a manner similar to the human brain. Different neurons in the neural network are connected to each other. Based on the above connection relationship, the neural network is divided into multiple layers, each layer includes one or more neurons, each neuron in the previous layer and one or more neurons in the latter layer. Neurons are connected and can send data to neurons in later layers.
  • Each layer of neurons also has weight data, such as the weight value of the connection between each neuron in the layer and the neurons in the previous layer (eg, to form a weight matrix).
  • weight data such as the weight value of the connection between each neuron in the layer and the neurons in the previous layer (eg, to form a weight matrix).
  • all weight data in a layer have the same precision, that is, neurons in each layer have a uniform weight precision.
  • the result data can be calculated with the weight data of the layer, and the precision of the result data can be converted in the related art to obtain the output data that meets the precision requirements.
  • the precision of the output data of each layer of neural network is generally the precision of the output data that is the same for all layers in advance.
  • the precision configuration scheme of this output data is not flexible enough and needs to be improved.
  • Embodiments of the present invention provide a method and apparatus for precision configuration of output data, a neural network device, and a computer-readable storage medium.
  • an embodiment of the present invention provides a method for configuring the accuracy of output data, wherein the method is applied to a neural network device, and the method includes: acquiring the weight accuracy of a receiving layer in a neural network, wherein the The receiving layer is the next layer of the sending layer, and the sending layer is any layer except the last layer in the neural network; at least the accuracy of the data to be output from the sending layer is configured according to the weight accuracy of the receiving layer .
  • the method before the acquiring the weight precision of the receiving layer in the neural network, the method further includes: acquiring the precision of the data to be output in the transmitting layer.
  • the configuring the accuracy of the data to be output from the sending layer at least according to the weight accuracy of the receiving layer includes: according to the accuracy of the data to be output and the weight accuracy of the receiving layer, Determine the target precision; configure the precision of the data to be output as the target precision.
  • the acquiring the precision of the data to be output of the transmission layer includes: acquiring the precision of the input data of the transmission layer and the weight precision of the transmission layer; according to the precision of the input data and the weight precision of the transmission layer to determine the precision of the data to be output from the transmission layer, and the precision of the data to be output is greater than or equal to the higher precision of the precision of the input data and the weight precision of the transmission layer.
  • the target precision is lower than the precision of the data to be output, and is not lower than the weight precision of the receiving layer .
  • the target precision is not lower than the precision of the data to be output, and is not higher than the precision of the receiving layer Weight accuracy.
  • the configuring the precision of the data to be outputted of the transmitting layer at least according to the weighting precision of the receiving layer includes: determining the weighting precision of the receiving layer as a target precision; The precision of the output data is configured to the target precision.
  • the method further includes: outputting the output data obtained after the configuration to the receiving layer.
  • the neural network device is implemented based on a many-core architecture.
  • an embodiment of the present invention provides an apparatus for configuring the accuracy of output data, wherein the apparatus is integrated into a neural network device, and the apparatus includes: a weight accuracy obtaining module, configured to obtain the accuracy of the receiving layer in the neural network. Weight accuracy, wherein the receiving layer is the next layer of the sending layer, and the sending layer is any layer in the neural network except the last layer; the accuracy configuration module is used for at least according to the weight accuracy of the receiving layer The precision of the data to be output of the transmission layer is configured.
  • an embodiment of the present invention provides a neural network device, wherein the neural network device includes at least one processing core, and the processing core is used to implement any one of the above-mentioned precision configuration methods for output data.
  • the neural network device includes a plurality of processing cores forming a many-core architecture.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processing core, implements any one of the above-mentioned methods for configuring the precision of output data.
  • the output data precision configuration scheme provided in the embodiment of the present invention is applied to a neural network device to obtain the precision of the data to be output in the sending layer in the neural network.
  • the weight precision of the receiving layer is obtained first.
  • the receiving layer is the next layer of the sending layer, and the accuracy of the output data is configured according to the weight accuracy of the receiving layer.
  • FIG. 1 is a schematic diagram of a precision configuration scheme of output data in the related art.
  • FIG. 2 is a schematic flowchart of a method for configuring the precision of output data according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of still another method for configuring the precision of output data according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an output data precision configuration scheme provided by an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of another method for configuring the precision of output data according to an embodiment of the present invention.
  • FIG. 6 is a structural block diagram of an apparatus for configuring the precision of output data according to an embodiment of the present invention.
  • FIG. 7 is a structural block diagram of a neural network device according to an embodiment of the present invention.
  • FIG. 8 is a structural block diagram of a computer-readable storage medium according to an embodiment of the present invention.
  • a neural network is configured in the neural network device of the embodiment of the present invention.
  • the neural network in the embodiment of the present invention may include an artificial neural network (Artificial Neural Network, ANN), and may also include a spiking neural network (Spiking Neural Network, SNN) and other types of neural networks (such as convolutional neural networks CNN, etc.).
  • the specific type of neural network is not limited, for example, it can be an acoustic model, a speech recognition model, an image recognition model, etc., and can be applied to data centers, security fields, smart medical fields, autonomous driving fields, smart transportation fields, smart home fields and other related fields.
  • Fig. 1 is a schematic diagram of the precision configuration scheme of the output data of a neural network in the related art, and the weight precision of each layer of the neural network carried in the neural network device (the precision marked below each layer in the figure) is the same, and The precision of the output data of each layer (the precision marked by the arrows between the layers in the figure) is also the same, so the input data of each layer and the calculation process of the weights correspond to the precision calculation process (the lowermost two precisions in the figure are the same. multiply) is usually the same.
  • the above methods may result in loss of accuracy, or increase in the amount of data transmission, etc.
  • Fig. 1 For the convenience of description, only four layers in the neural network are shown, which are L1, L2, L3 and L4 respectively from front to back.
  • the precision (data precision) of the input data of L1 is FP32 (32-bit floating point)
  • the weight precision of L1 is FP32
  • the calculation process of the corresponding precision in the multiply-accumulate operation is FP32*FP32
  • the result data obtained directly by the calculation is usually higher than that of FP32.
  • the precision of the output data of all layers including L1 can be specified to be FP32, so the above calculation result can only be used as the output data after the precision interception is performed.
  • the calculation result can be output only after filling it with higher precision. That is, the precision of the output data of each layer of the neural network in the related art is usually the data precision set uniformly. Obviously, the precision configuration method of this output data is not flexible enough.
  • the corresponding weight precision is specifically configured for each layer according to the weight precision of the subsequent layer; and when the weight precision of different layers is different, Then the precision of the output data of different layers may also be different, that is, the neural network can use mixed precision, so that the precision of the output data of each layer can meet its needs, better balance the storage capacity and computing energy consumption, and the neural network.
  • recognition rate or accuracy rate
  • an embodiment of the present invention provides a precision configuration method for output data.
  • FIG. 2 is a schematic flowchart of a method for configuring the accuracy of output data according to an embodiment of the present invention.
  • the method is used in a neural network device and can be executed by a device for configuring the accuracy of output data, wherein the device can be software and/or hardware.
  • the implementation can generally be integrated in a neural network device, such as a processing core in a neural network device.
  • the method includes the following steps 201 and 202 .
  • Step 201 Acquire the weight accuracy of the receiving layer in the neural network, wherein the receiving layer is the next layer of the sending layer, and the sending layer is any layer except the last layer in the neural network.
  • the specific structure of the neural network is not limited, for example, the number of neuron layers included in the neural network may be any number of layers above two.
  • the sending layer and the receiving layer are two layers corresponding to each other, that is, the receiving layer is the layer that receives the output data of the current sending layer, so the sending layer is not necessarily the first layer in the neural network, but can be Any layer except the last layer; correspondingly, the receiving layer can be any layer except the first layer of the neural network.
  • each layer of the neural network can be configured in one or more processing cores, and each processing core can include a processor and has its own memory, so the calculation of each layer can be performed locally in the processing core where the layer is located. , and the output data is sent to the processing core where the next layer is located.
  • the sending processing core where the sending layer is located can calculate the data to be output according to the input data of the sending layer and the weight parameters (such as weight matrix, etc.) of the sending layer, and output it to the receiving processing core where the receiving layer is located.
  • the weight accuracy of a layer refers to the accuracy of the weight values of all neurons in the layer (such as the weight values of the connections between neurons and neurons in the previous layer).
  • the precision of the weight values of the neurons should be the same, so that a layer has a uniform weight precision. Among them, it is also feasible if the weight accuracy of different neurons in a layer is different.
  • the weight precision of different layers may be different, and the specific manner of acquiring the weight precision of the receiving layer is not limited.
  • the weight accuracy of the receiving layer can be stored in the storage area in the sending processing core during the compilation stage, and after the data to be output from the sending layer is obtained, the weight accuracy of the receiving layer can be read from the storage area;
  • the processing core corresponding to the layer is the receiving processing core.
  • the storage area in the receiving processing core can store the weight precision of the receiving layer, and the sending processing core can obtain the weight precision of the receiving layer from the receiving processing core through inter-core communication.
  • Step 202 Configure the precision of the data to be output from the transmitting layer at least according to the weighting precision of the receiving layer.
  • the accuracy of the data to be outputted at the sending layer is configured (or set) with reference to the weight accuracy of the receiving layer, and the configured data with the required accuracy is used as the data actually output by the sending layer ( output data), the specific reference method and configuration method are not limited.
  • the accuracy of the data to be output can be configured to be lower than the weight accuracy of the receiving layer, the accuracy of the data to be output can also be configured to be higher than the weight accuracy of the receiving layer, or the accuracy of the data to be output can be configured.
  • the precision is configured to be the same precision as the weight precision of the receiving layer, and the precision of the output data is obtained.
  • the precision level of the difference between the weight precision of the receiving layer and the precision of the output data may be a first preset precision level difference value.
  • the precision level is used to indicate the level of data precision. The higher the precision, the higher the corresponding precision level.
  • the precision values corresponding to different precision levels can be set according to actual needs. For example, between the precision Int4 (4-bit integer type) and FP16, there is also Int8, the precision level of the difference can be 2, and the precision level of the difference between Int4 and Int8 can be 1.
  • the weight accuracy of the receiving layer is FP16
  • the first preset accuracy level difference is 2
  • the accuracy of the data to be output is configured to be lower than the weight accuracy of the receiving layer
  • the accuracy of the data to be output is configured to be Int4.
  • the method for configuring the accuracy of output data provided in the embodiment of the present invention is applied to a neural network device to obtain the accuracy of the data to be output in the sending layer in the neural network, and before outputting the data to be output, first obtain the weight accuracy of the receiving layer , and configure the precision of the output data according to the weight precision of the receiving layer.
  • the method before the acquiring the weight precision of the receiving layer in the neural network, the method further includes: Step 200 , acquiring the precision of the data to be output in the transmitting layer.
  • the accuracy of the data to be output from the sending layer can also be obtained first. should have the precision. For example, it is the accuracy of the calculation result obtained by multiplying and accumulating the input data of the transmission layer and the weight data.
  • the precision of the data to be output is greater than or equal to the higher of the precision of the input data of the sending layer and the precision of the weight.
  • the precision of the data to be output will generally increase relatively high (for example, increase to Int8 or Int16 respectively), and the higher the input data accuracy and the weight accuracy, the higher the accuracy.
  • the precision of the output data may not increase, or increase the comparison Less (for example, from FP16 to FP32), because the precision after multiply-accumulate operation is already high enough.
  • the acquiring the precision of the data to be output of the transmission layer includes: acquiring the precision of the input data of the transmission layer and the weight precision of the transmission layer; according to the precision of the input data and the weight precision of the transmission layer to determine the precision of the data to be output from the transmission layer, and the precision of the data to be output is greater than or equal to the higher precision of the precision of the input data and the weight precision of the transmission layer.
  • the precision of the data to be output can be determined according to the precision of the input data and the weight precision of the transmission layer, specifically ensuring that the precision of the data to be output is greater than or equal to the higher of the precision of the input data and the weight precision. Because the precision of the result of the multiply-accumulate operation process is usually higher than the precision of either of the two parameters used for its operation.
  • the configuring the accuracy of the data to be output from the sending layer at least according to the weight accuracy of the receiving layer includes: according to the accuracy of the data to be output and the weight accuracy of the receiving layer, Determine the target precision; configure the precision of the data to be output as the target precision.
  • a "target accuracy” may be jointly determined according to the weight accuracy of the receiving layer and the accuracy of the data to be outputted at the sending layer obtained above, and the accuracy of the data to be output is configured as the target accuracy, that is, the setting for sending The actual output data of the layer has the target precision.
  • the target precision is lower than the precision of the data to be output, and is not lower than the weight precision of the receiving layer .
  • the accuracy of the data to be output (original accuracy) is higher than the accuracy of the weight
  • the accuracy of the data to be output can be reduced, but it should not be reduced to a lower accuracy than the weight, so as not to affect the recognition rate of the neural network.
  • the advantage of this setting is that it is equivalent to intercepting the precision of the data to be output according to the weight precision of the receiving layer, so that the precision of the data to be output is reduced, thereby reducing the amount of data transmission. It can also reduce the amount of calculation, thereby reducing the energy consumption caused by data processing.
  • the target accuracy may be equal to the weight accuracy of the receiving layer at this time, and the weight accuracy of the receiving layer can be directly determined as the target accuracy.
  • the advantage of this setting is that it is equivalent to intercepting the precision of the data to be output to the precision that is consistent with the weight precision of the receiving layer, which can minimize the amount of data transmission and reduce the energy consumption caused by data processing. Increase computing power.
  • the target precision is not lower than the precision of the data to be output, and is not higher than the precision of the receiving layer Weight accuracy.
  • the precision of the data to be output (original precision) is not higher than the precision of the weight (including the same or lower), the precision of the data to be output can be kept unchanged, or the precision of the data to be output can be increased, but cannot be increased to Higher accuracy than weights to improve the recognition rate of the neural network.
  • the target accuracy may also be equal to the weight accuracy of the receiving layer, that is, the weight accuracy of the receiving layer can be directly determined as the target accuracy.
  • the configuring the precision of the data to be outputted of the transmitting layer at least according to the weighting precision of the receiving layer includes: determining the weighting precision of the receiving layer as a target precision; The precision of the output data is configured to the target precision.
  • the relationship between the weighting accuracy of the receiving layer and the accuracy of the data to be outputted at the sending layer may not be determined, but the weighting accuracy of the receiving layer may be directly determined as the target accuracy.
  • the implementation process of the method can be simplified, and the accuracy of the input data used for calculation in any layer is guaranteed to be equal to the weight accuracy of the layer, which can better balance the storage capacity and computing energy consumption, and the recognition rate (or accuracy rate of the neural network) of the neural network. ) The relationship between.
  • the method further includes: outputting the output data obtained after the configuration to the receiving layer.
  • the configured data with the required precision can be used as output data and directly transmitted to the receiving layer, for example, to the receiving processing core where the receiving layer is located, so that the processing core corresponding to the receiving layer can receive it.
  • Layer related calculations After configuring the precision of the data to be output, the configured data with the required precision can be used as output data and directly transmitted to the receiving layer, for example, to the receiving processing core where the receiving layer is located, so that the processing core corresponding to the receiving layer can receive it. Layer related calculations.
  • the neural network device is implemented based on a many-core architecture.
  • the many-core architecture includes multiple processing cores, and can have multi-core reorganization characteristics. There is no master-slave distinction between processing cores and processing cores.
  • Software can be used to configure tasks flexibly, and different tasks can be configured in different processing cores at the same time. (For example, each processing core is equipped with a layer of neurons) to realize multi-task parallel processing.
  • a series of processing cores can form an array to complete the calculation of the neural network, which can efficiently support various neural network algorithms and improve the performance of the device.
  • the neural network device can use an on-chip network, such as a two-dimensional mesh (2D Mesh) on-chip network, for the communication interconnection between cores, and the communication between the device and the outside can be realized through a high-speed serial port.
  • an on-chip network such as a two-dimensional mesh (2D Mesh) on-chip network
  • FIG. 3 is a schematic flowchart of another method for configuring the accuracy of output data provided by an embodiment of the present invention. As shown in FIG. 3 , the method includes:
  • Step 301 Acquire the to-be-output data of the sending layer in the neural network.
  • the sending layer may be any layer except the last layer in the neural network.
  • the sending layer may be different, that is, the sending layer is not a specific layer in the neural network.
  • Step 302 Acquire the weight precision of the receiving layer, where the receiving layer is the next layer of the transmitting layer.
  • Step 303 Determine whether the weight accuracy of the receiving layer is lower than the accuracy of the data to be outputted at the sending layer, if so, go to step 304; otherwise, go to step 305.
  • the weighting accuracy of the receiving layer and the accuracy of the data to be outputted at the sending layer may not be determined, and the weighting accuracy of the receiving layer may be directly determined as the target accuracy.
  • Step 304 Determine the weight precision of the receiving layer as the target precision, configure the precision of the data to be output in the sending layer as the target precision, obtain output data, and perform step 306 .
  • Step 305 Keep the precision of the data to be output in the transmission layer unchanged, or configure the precision of the data to be output in the transmission layer as the weight precision of the receiving layer, to obtain the output data.
  • keeping the precision of the data to be output in the transmission layer unchanged can reduce the amount of transmission between the transmission layer and the reception layer.
  • Step 306 output the output data to the receiving layer, for example, to a processing core corresponding to the receiving layer.
  • the solution for configuring the accuracy of output data provided by the embodiment of the present invention is applied to a neural network device to obtain the data to be output from the sending layer in the neural network, and before outputting the data to be output, first obtain the weight accuracy of the next layer, and
  • the precision of the data to be output is configured to be the same precision as the weight precision of the next layer, and the output data is obtained and output to the next layer (eg, output to the processing core corresponding to the next layer).
  • the weight accuracy of the next layer can be directly configured before data output, which can reduce the accuracy loss in data conversion.
  • the weight accuracy of the latter layer is lower than that of the previous layer, the amount of data transmission and data processing can be reduced. energy consumption.
  • FIG. 4 is a schematic diagram of an output data precision configuration scheme provided by an embodiment of the present invention. As shown in FIG. 4 , for the convenience of description, only four layers in the neural network are shown, which are L1, L2, L3 and L4.
  • the precision of the input data is Int8, the weight precision of L1 is Int8, then the calculation process of the corresponding precision in the multiply-accumulate operation is Int8*Int8, so the precision of the calculation result obtained by the multiply-accumulate operation is FP16.
  • the precision of the output data is set to be Int8, it is necessary to ensure that the precision of the data actually output by L1 is Int8, that is, the precision of FP16 of the above calculation result needs to be intercepted as Int8, and then the precision is output from L1. Int8 data. Since the weight precision of L2 is FP16, when calculating in L2, it is necessary to fill in the precision of the intercepted Int8 to the precision of FP16. The necessary interception and completion process consumes more computation.
  • the weight accuracy of L2 is obtained first, and then the original accuracy (accuracy of the calculation result) of the L1 data to be output is known to be the same as the weight accuracy of L2 (both are FP16). Therefore, according to the weight accuracy of L2
  • the precision of the data to be output should be configured as FP16, that is, the precision interception operation will not be performed on the output data (calculation result), but the output data to be outputted directly to obtain the output data of FP16 precision, which can reduce the precision loss in data conversion, and reduce unnecessary operations.
  • the precision of the input data is FP16
  • the precision of the weight is FP16.
  • the precision of the output data should also be FP16 (actually, the FP16 is also calculated by The results are obtained by precision interception).
  • the weight precision of L4 is Int8, so the data precision required in its calculation is actually only Int8. Therefore, if the output data precision of L3 is FP16, it is equivalent to adding some "invalid" data transmission between L3 and L4. quantity.
  • the weight precision of L4 is obtained as Int8 first, then when it is known that the original precision of the L3 data to be output is higher than the weight precision of L4, the precision of the data to be output can be configured as Int8, that is, the precision of the L3 data can be configured as Int8.
  • the direct precision of the calculation result is intercepted to Int8 (instead of intercepted to FP16), so that the precision of the actual output data is Int8, so only the data with the precision of Int8 can be transmitted between L3 and L4.
  • the method of the embodiment of the present invention reduces the accuracy of the output data of L3, reduces the amount of data transmission between L3 and L4, that is, reduces the processing core where L3 is located and the processing core where L4 is located. Data traffic, and will not affect the calculation accuracy of L4, greatly improving performance.
  • FIG. 5 is a schematic flowchart of another method for configuring the accuracy of output data provided by an embodiment of the present invention.
  • the image recognition model is a convolutional neural network model (CNN).
  • CNN convolutional neural network model
  • the accumulation layer, the pooling layer and the fully connected layer respectively correspond to the first processing core, the second processing core and the third processing core in the neural network device, and the method may include:
  • Step 503 Obtain the image data to be recognized through the first processing core, and calculate the feature map data to be output from the convolution layer according to the image data to be recognized and the weight parameters of the convolution layer, obtain the weight accuracy of the pooling layer, and convert the convolution layer
  • the accuracy of the feature map data to be output is configured as the weight accuracy of the pooling layer, and the output feature map data of the convolution layer is obtained and output to the second processing core.
  • Step 504 Calculate the feature vector data to be output of the pooling layer according to the output feature map data of the convolution layer and the weight parameters of the pooling layer through the second processing core, obtain the weight accuracy of the fully connected layer, and calculate the output feature vector data of the pooling layer.
  • the accuracy of the feature vector data is configured as the weight accuracy of the fully connected layer, and the output feature vector data of the pooling layer is obtained and output to the third processing core.
  • Step 505 Calculate and output the image recognition result according to the output feature vector data of the pooling layer and the weight parameter of the fully connected layer by the third processing core.
  • the processing cores corresponding to the convolution layer and the pooling layer determine the image output according to the weight accuracy of the next layer before outputting the image data.
  • the accuracy of the data can reduce the loss of accuracy in the conversion of image information, reduce the amount of image data transmission, and reduce the energy consumption of image data processing.
  • the performance of the neural network device of the image recognition network model is effectively improved, and the recognition efficiency of the image recognition model can also be improved.
  • an embodiment of the present invention provides a precision configuration apparatus 600 for outputting data.
  • FIG. 6 is a structural block diagram of an output data precision configuration apparatus 600 according to an embodiment of the present invention.
  • the apparatus can be implemented by software and/or hardware, and can generally be integrated in a neural network device.
  • the output data precision configuration method can be executed by executing To configure the precision of the output data.
  • the apparatus at least includes the following weight precision acquisition module 602 and precision configuration module 603 .
  • the weight accuracy obtaining module 602 is used to obtain the weight accuracy of the receiving layer in the neural network, wherein the receiving layer is the next layer of the sending layer, and the sending layer is any layer except the last layer in the neural network .
  • the precision configuration module 603 is configured to configure the precision of the data to be output from the transmission layer at least according to the weight precision of the receiving layer.
  • the device 600 for configuring the accuracy of output data provided in the embodiment of the present invention is applied to a neural network device to obtain the accuracy of the data to be output in the sending layer in the neural network, and before outputting the data to be output, first obtain the weight of the receiving layer Accuracy, the receiving layer is the next layer of the transmitting layer, and the accuracy of the output data is configured according to the weight accuracy of the receiving layer.
  • the accuracy of the output data of one or more layers of the neural network deployed in the neural network device can be flexibly configured, thereby optimizing the performance of the neural network device.
  • the apparatus 600 further includes: a to-be-output data acquisition module 601, configured to acquire the precision of the to-be-output data of the transmission layer.
  • the configuring the accuracy of the data to be output from the sending layer at least according to the weight accuracy of the receiving layer includes: according to the accuracy of the data to be output and the weight accuracy of the receiving layer, Determine the target precision; configure the precision of the data to be output as the target precision.
  • the acquiring the precision of the data to be output of the transmission layer includes: acquiring the precision of the input data of the transmission layer and the weight precision of the transmission layer; according to the precision of the input data and the weight precision of the transmission layer to determine the precision of the data to be output from the transmission layer, and the precision of the data to be output is greater than or equal to the higher precision of the precision of the input data and the weight precision of the transmission layer.
  • the target precision is lower than the precision of the data to be output, and is not lower than the weight precision of the receiving layer .
  • the target precision is not lower than the precision of the data to be output, and is not higher than the precision of the receiving layer Weight accuracy.
  • the configuring the precision of the data to be outputted of the transmitting layer at least according to the weighting precision of the receiving layer includes: determining the weighting precision of the receiving layer as a target precision; The precision of the output data is configured to the target precision.
  • the method further includes: outputting the output data obtained after the configuration to the receiving layer.
  • the neural network device is implemented based on a many-core architecture.
  • an embodiment of the present invention provides a neural network device 700, where the neural network device 700 includes at least one processing core 701, and the processing core 701 is configured to implement the output data provided by the embodiment of the present invention The precision configuration method.
  • the neural network device 700 includes a plurality of processing cores 701 forming a many-core architecture.
  • the neural network device 700 in this embodiment of the present invention may adopt a many-core architecture, that is, it includes multiple processing cores 701, each processing core 701 includes a processor and has its own memory, and different processing cores 701 can be connected through an on-chip network 702 (such as 2D Mesh) to realize information interaction, so that each processing core 701 can perform certain calculations, and the calculation of the neural network can be jointly implemented by multiple processing cores 701.
  • an on-chip network 702 Such as 2D Mesh
  • one layer of a neural network may be configured within each processing core 701 .
  • the specific implementation forms of the neural network device 700 of the many-core architecture can be various, for example, the device can include an array of multiple chips (ICs), and each chip has one processing core 701 or multiple processing cores 701; or, The device may also include only one chip with multiple processing cores 701 within the chip.
  • the device can include an array of multiple chips (ICs), and each chip has one processing core 701 or multiple processing cores 701; or, The device may also include only one chip with multiple processing cores 701 within the chip.
  • the neural network device 700 when the neural network device 700 includes an array of multiple chips, different neural networks can be efficiently supported at the same time, for example, both the ANN algorithm and the SNN algorithm can be efficiently supported. Specifically, at this time, different chips of the neural network device 700 can carry different neural network models, which can be configured according to actual needs. The scalability is good, and a chip array with great computing power can be obtained, and the computing power efficiency will not decrease. , which can support multi-core reorganization and realize multi-task parallel processing.
  • an embodiment of the present invention provides a computer-readable storage medium 800 on which a computer program is stored, and when the computer program is executed by a processing core, realizes the precision configuration of the output data provided by the embodiment of the present invention method.
  • the device for configuring the accuracy of output data, the neural network device, and the computer-readable storage medium provided in the above embodiments can execute the method for configuring the accuracy of output data provided by any embodiment of the present invention, and have corresponding functional modules and beneficial effects for executing the method. .
  • the method for configuring the precision of output data provided by any embodiment of the present invention can execute the method for configuring the precision of output data provided by any embodiment of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Optical Recording Or Reproduction (AREA)

Abstract

本发明公开了输出数据的精度配置方法和装置、神经网络设备、计算机可读存储介质。本发明的方法应用于神经网络设备中,所述方法包括:获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层;至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。本发明通过采用上述技术方案,可以灵活配置部署在神经网络设备中的神经网络中的一层或多层的输出数据的精度,进而优化神经网络设备的性能。

Description

数据精度配置方法和装置、神经网络设备、介质 技术领域
本发明实施例涉及人工智能技术领域,尤其涉及输出数据的精度配置方法和装置、神经网络设备、计算机可读存储介质。
背景技术
基于计算机系统的神经网络包括大量神经元,可按照类似人脑的方式处理问题,如进行图像识别、语音识别以及自然语言处理等。神经网络中的不同神经元相互连接,基于以上连接关系神经网络分为多层,每层包括一个或多个神经元,在前层中的每个神经元与在后层中的一个或多个神经元连接,并可向在后层中的神经元发送数据。
每层神经元还有权重数据,例如是该层中每个神经元与在前层神经元间的连接的权重值(如构成权重矩阵)。通常,一层中所有的权重数据具有相同精度,即每层神经元具有统一的权重精度。
当数据输入神经网络的一层时可与该层权重数据进行计算得到结果数据,相关技术中可对结果数据的精度进行转换,得到符合精度要求的输出数据。其中,各层神经网络的输出数据的精度一般是预先规定的所有层均相同的输出数据的精度。然而,这种输出数据的精度配置方案不够灵活,需要改进。
发明内容
本发明实施例提供了输出数据的精度配置方法和装置、神经网络设备、计算机可读存储介质。
第一方面,本发明实施例提供一种输出数据的精度配置方法,其中,所述方法应用于神经网络设备中,所述方法包括:获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层;至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
在一些实施例中,在所述获取神经网络中的接收层的权重精度前, 还包括:获取所述发送层的所述待输出数据的精度。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:根据所述待输出数据的精度以及所述接收层的权重精度,确定目标精度;将所述待输出数据的精度配置成目标精度。
在一些实施例中,所述获取所述发送层的所述待输出数据的精度,包括:获取所述发送层的输入数据的精度和所述发送层的权重精度;根据所述输入数据的精度和所述发送层的权重精度确定所述发送层的待输出数据的精度,所述待输出数据的精度大于或等于所述输入数据的精度和所述发送层的权重精度中较高的精度。
在一些实施例中,在所述接收层的权重精度低于所述待输出数据的精度时,所述目标精度低于所述待输出数据的精度,且不低于所述接收层的权重精度。
在一些实施例中,在所述接收层的权重精度不低于所述待输出数据的精度时,所述目标精度不低于所述待输出数据的精度,且不高于所述接收层的权重精度。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:将所述接收层的权重精度确定为目标精度;将所述待输出数据的精度配置成目标精度。
在一些实施例中,在所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置之后,还包括:将配置后得到的输出数据输出至所述接收层。
在一些实施例中,所述神经网络设备基于众核架构实现。
第二方面,本发明实施例提供一种输出数据的精度配置装置,其中,所述装置集成于神经网络设备中,所述装置包括:权重精度获取模块,用于获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层;精度配置模块,用于至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
第三方面,本发明实施例提供一种神经网络设备,其中,所述神经网络设备包含至少一个处理核,所述处理核用于实现上述任意一种的输出数据的精度配置方法。
在一些实施例中,所述神经网络设备包括多个形成众核架构的处理核。
第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理核执行时实现上述任意一种的输出数据的精度配置方法。
本发明实施例中提供的输出数据的精度配置方案,应用于神经网络设备中,获取神经网络中的发送层的待输出数据的精度,在对待输出数据进行输出之前,先获取接收层的权重精度,接收层为发送层的下一层,并根据接收层的权重精度对待输出数据的精度进行配置。通过采用上述技术方案,可以灵活配置部署在神经网络设备中的神经网络中的一层或多层的输出数据的精度,进而优化神经网络设备的性能。
附图说明
图1为相关技术中的一种输出数据的精度配置方案示意图。
图2为本发明实施例提供的一种输出数据的精度配置方法的流程示意图。
图3为本发明实施例提供的又一种输出数据的精度配置方法的流程示意图。
图4为本发明实施例提供的一种输出数据的精度配置方案示意图。
图5为本发明实施例提供的另一种输出数据的精度配置方法的流程示意图。
图6为本发明实施例提供的一种输出数据的精度配置装置的结构框图。
图7为本发明实施例提供的一种神经网络设备的结构框图。
图8为本发明实施例提供的一种计算机可读存储介质的结构框图。
具体实施方式
下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
需要注意,本发明实施例中提及的“第一”、“第二”等概念仅用于对不同的装置、模块、单元或其它对象进行区分,并非用于限定这些装置、模块、单元或其它对象所执行的功能的顺序或者相互依存关系。
为了更好地理解本发明实施例,下面对相关技术进行介绍。
本发明实施例的神经网络设备中配置有神经网络。本发明实施例中的神经网络可包括人工神经网络(Artificial Neural Network,ANN),也可包括脉冲神经网络(Spiking Neural Network,SNN)以及其它类型的神经网络(如卷积神经网络CNN等)。神经网络的具体类型不做限定,例如可以是声学模型、语音识别模型以及图像识别模型等等,可以应用于数据中心、安防领域、智能医疗领域、自动驾驶领域、智慧交通领域、智能家居领域以及其它相关领域。
图1为相关技术中的一种神经网络的输出数据的精度配置方案示意图,其神经网络设备中承载的神经网络的各层的权重精度(图中每层下方标识的精度)是相同的,且设定各层输出数据的精度(图中层间箭头处标识的精度)也相同,故每层的输入数据与权重进行的计算过程对应的精度计算过程(图中最靠下的两个精度相乘)通常也是相同的。但是,以上方式可能导致精度损失,或导致数据传输量的增加等。
如图1所示,为了便于说明,仅示出了神经网络中的四层,从前到后依次分别为L1、L2、L3和L4。L1的输入数据的精度(数据精度)为FP32(32位浮点),L1的权重精度为FP32,那么乘累加操作中对应的精度的计算过程为FP32*FP32,而计算直接得到的结果数据(计算结果)的精度通常高于FP32。而相关技术中,可以规定包括L1在内的所有层的输出数据精度为FP32,故要对以上计算结果进行精度截取后,才能作为输出数据。或者,如果相关技术规定包括L1在内的所有层的输出数据精度更高,则要将计算结果补齐为更高精度后才能输出。即,相关技术中每层神经网络输出数据的精度,通常是统一设置的数据精度,显然这种输出数据的精度配置方式不够灵活。
而在本发明实施例中,不是直接设定所有层的输出数据精度相同,而是根据在后层的权重精度,具体为每层配置相应的权重精度;而当不同层的权重精度不同时,则不同层的输出数据的精度也可能不同,即神经网络可采用混合精度,从而使每个层的输出数据的精度都能符合其需求,更好地平衡存储容量和计算能耗,与神经网络识别率(或准确率)之间的关系。
第一方面,本发明实施例提供一种输出数据的精度配置方法。
图2为本发明实施例提供的一种输出数据的精度配置方法的流程示意图,该方法用于神经网络设备中,并可以由输出数据的精度配置装置执行,其中该装置可由软件和/或硬件实现,一般可集成在神经网络设备中,例如神经网络设备中的处理核中。
如图2所示,该方法包括以下步骤201和步骤202。
步骤201、获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层。
本发明实施例中,对神经网络的具体结构不做限定,例如神经网络中包含的神经元层数可以是两层以上的任意层数。
其中,发送层和接收层是相互对应的两层,即接收层是接收当前的发送层的输出数据的层,故该发送层并不一定是神经网络中排在最 前面的层,而可以是除最后一层之外的任意一层;相应的,接收层可以是除神经网络的第一层之外的任意一层。
示例性的,神经网络的每一层可配置在一个或多个处理核中,而每个处理核可包含处理器并自带存储器,故每层的计算可在该层所在的处理核本地进行,而输出数据则发送给下一层所在的处理核。例如,发送层所在的发送处理核,可根据发送层的输入数据和发送层的权重参数(如权重矩阵等)计算得到待输出数据,并输出至接收层所在的接收处理核。
本发明实施例中,一层的权重精度是指该层中所有神经元的权重值(如神经元与在前层神经元间的连接的权重值)的精度,故通常而言,一层所有神经元的权重值的精度应当是相同的,以使一层有一个统一的权重精度。其中,若一层中不同神经元的权重精度不同,也是可行的。
本发明实施例中,不同层的权重精度可以是不同的,获取接收层的权重精度的具体方式不做限定。例如,可以在编译阶段将接收层的权重精度存储在发送处理核内的存储区,在获取到发送层的待输出数据后,从该存储区读取接收层的权重精度;又如,假设接收层对应的处理核为接收处理核,接收处理核内的存储区中可以存储有接收层的权重精度,发送处理核可通过核间通信的方式从接收处理核获取接收层的权重精度。
步骤202、至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
本发明实施例中,参考接收层的权重精度对发送层的待输出数据的精度进行配置(或者说设定),并将配置后的、具有所需精度的数据作为发送层实际输出的数据(输出数据),其具体的参考方式和配置方式不做限定。
示例性的,可将待输出数据的精度配置成比接收层的权重精度低的精度,也可将待输出数据的精度配置成比接收层的权重精度高的精度,或者可将待输出数据的精度配置成与接收层的权重精度相同的精 度,得到输出数据的精度。
接收层的权重精度和输出数据的精度之间相差的精度等级可以是第一预设精度等级差值。其中,精度等级用于表示数据精度的高低,精度越高,对应的精度等级越高,不同精度等级对应的精度值可根据实际需求设置。例如,精度Int4(4位整型)和FP16之间,还存在Int8,相差的精度等级可以是2,而Int4和Int8之间相差的精度等级可以是1。假设接收层的权重精度为FP16,第一预设精度等级差值为2,若将待输出数据的精度配置成比接收层的权重精度低的精度,则将待输出数据的精度配置成Int4。
本发明实施例中提供的输出数据的精度配置方法,应用于神经网络设备中,获取神经网络中的发送层的待输出数据的精度,在对待输出数据进行输出之前,先获取接收层的权重精度,并根据接收层的权重精度对待输出数据的精度进行配置。通过采用上述技术方案,可以灵活配置部署在神经网络设备中的神经网络中的一层或多层的输出数据的精度,进而优化神经网络设备的性能。
在一些实施例中,如图2所示,在所述获取神经网络中的接收层的权重精度前,还包括:步骤200、获取所述发送层的所述待输出数据的精度。
在获取接收层的权重精度前,还可先获取发送层的待输出数据的精度,也就是说在“不进行”本发明实施例的方法的情况下,发送层的待输出数据的“原本”应具有的精度。例如,为发送层的输入数据与权重数据进行乘累加计算后得到的计算结果的精度。
一般的,待输出数据的精度大于或等于发送层的输入数据精度和权重精度中的较高者。在发送层中,如果输入数据精度和权重精度本身就比较低(如Int2、Int4或Int8),在乘累加操作后之后,计算结果的精度(位数)可能不足(例如无法满足对应的处理核等硬件配置方面的需求),就需要提高精度,那么待输出数据的精度通常来说,会增加得比较高(例如分别增加到Int8或Int16),且输入数据精度和权重精度中较高者越低,需要提升的精度等级越多;相反,如果输入数 据精度和权重精度本身就已经比较高了(比如FP16、FP32或FP64),那么待输出数据的精度可能就不会增加,或者增加的比较少(例如从FP16增加到FP32),因为,经过乘累加操作后的精度已经足够高。
在一些实施例中,所述获取所述发送层的所述待输出数据的精度,包括:获取所述发送层的输入数据的精度和所述发送层的权重精度;根据所述输入数据的精度和所述发送层的权重精度确定所述发送层的待输出数据的精度,所述待输出数据的精度大于或等于所述输入数据的精度和所述发送层的权重精度中较高的精度。
本发明实施例中,可根据发送层的输入数据的精度和权重精度确定其待输出数据的精度,具体是保证待输出数据的精度大于或等于输入数据的精度和权重精度中的较高者。因为乘累加运算过程的结果的精度,通常是高于其运算用的两个参数的精度中的任意一种的。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:根据所述待输出数据的精度以及所述接收层的权重精度,确定目标精度;将所述待输出数据的精度配置成目标精度。
本发明实施例中,可根据以上得到的接收层的权重精度和发送层的待输出数据的精度,共同确定一个“目标精度”,并将待输出数据的精度配置成目标精度,即设定发送层实际的输出数据具有目标精度。
在一些实施例中,在所述接收层的权重精度低于所述待输出数据的精度时,所述目标精度低于所述待输出数据的精度,且不低于所述接收层的权重精度。
当待输出数据的精度(原本的精度)比权重精度高时,可降低待输出数据的精度,但不应降低到比权重精度更低,以免影响神经网络的识别率。这样设置的好处在于,相当于根据所述接收层的权重精度对所述待输出数据的精度进行截取操作,使得待输出数据的精度降低,从而降低数据传输量,在接收层进行数据计算时,也能够降低计算量,进而降低数据处理所带来的能耗。
其中,此时目标精度可能等于接收层的权重精度,即可直接将所 述接收层的权重精度确定为目标精度。这样设置的好处在于,相当于将所述待输出数据的精度截取为与所述接收层的权重精度相一致的精度,可以最大限度的降低数据传输量以及降低数据处理所带来的能耗,提高算力。
在一些实施例中,在所述接收层的权重精度不低于所述待输出数据的精度时,所述目标精度不低于所述待输出数据的精度,且不高于所述接收层的权重精度。
当待输出数据的精度(原本的精度)不比权重精度高(包括相同或更低)时,可使待输出数据的精度保持不变,或者可升高待输出数据的精度,但不能升高到比权重精度更高,以提高神经网络的识别率。
其中,此时目标精度也可能等于接收层的权重精度,即可直接将所述接收层的权重精度确定为目标精度。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:将所述接收层的权重精度确定为目标精度;将所述待输出数据的精度配置成目标精度。
作为本发明实施例的另一种方式,也可不判断接收层的权重精度和发送层的待输出数据的精度的关系,而是直接将接收层的权重精度确定为目标精度。这样可简化该方法的实现过程,且保证任意一层中计算用的输入数据的精度都等于该层的权重精度,更好地平衡存储容量和计算能耗,与神经网络识别率(或准确率)之间的关系。
在一些实施例中,在所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置之后,还包括:将配置后得到的输出数据输出至所述接收层。
在配置待输出数据的精度后,可将配置后的具有所需精度的数据作为输出数据,直接传输到接收层,例如传输到接收层所在的接收处理核,以便接收层对应的处理核进行接收层的相关计算。
在一些实施例中,所述神经网络设备基于众核架构实现。
众核架构包括多个处理核,且可以具备多核重组特性,处理核与 处理核之间没有主从之分,可以灵活的用软件来配置任务,把不同的任务同时配置在不同的处理核中(如每个处理核配置一层神经元),实现多任务的并行处理,可以由一系列处理核构成阵列来完成神经网络的计算,能够高效支持各种神经网络算法,提高设备性能。
示例性的,神经网络设备可采用片上网络,如二维网格(2D Mesh)的片上网络,用于核与核之间的通信互联,而设备与外部的通信可以通过高速串口实现。
图3为本发明实施例提供的又一种输出数据的精度配置方法的流程示意图,如图3所示,该方法包括:
步骤301、获取神经网络中的发送层的待输出数据。
其中,每次执行本方法时,所述发送层可为所述神经网络中的最后一层之外的任意一层。当然,在不同的执行本方法的过程中,发送层可以不同,即发送层不是神经网络中的某一个特定层。
步骤302、获取接收层的权重精度,其中,所述接收层为所述发送层的下一层。
步骤303、判断接收层的权重精度是否低于发送层的待输出数据的精度,若是,则执行步骤304;否则,执行步骤305。
可选的,本发明实施例中,也可不判断接收层的权重精度和发送层的待输出数据的精度的大小,直接将接收层的权重精度确定为目标精度。
步骤304、将接收层的权重精度确定为目标精度,将发送层的待输出数据的精度配置成目标精度,得到输出数据,执行步骤306。
步骤305、保持发送层的待输出数据的精度不变或将发送层的待输出数据的精度配置成接收层的权重精度,得到输出数据。
其中,保持发送层的待输出数据的精度不变可以减少发送层与接收层之间的传输量。
步骤306、将输出数据输出至接收层,例如输出至接收层对应的处理核。
本发明实施例提供的输出数据的精度配置方案,应用于神经网络设备中,获取神经网络中的发送层的待输出数据,在对待输出数据进行输出之前,先获取下一层的权重精度,并将待输出数据的精度配置成与下一层的权重精度相同的精度,得到输出数据并输出至下一层(如输出至下一层对应的处理核)。
通过采用上述技术方案在数据输出之前按照下一层的权重精度直接配置,可减少数据转换中的精度损失,当后一层权重精度低于前一层时,可降低数据传输量,降低数据处理能耗。
图4为本发明实施例提供的一种输出数据的精度配置方案示意图,如图4所示,为了便于说明,仅示出了神经网络中的四层,从前到后依次分别为L1、L2、L3和L4。
对于L1来说,输入数据的精度为Int8,L1的权重精度为Int8,那么乘累加操作中对应的精度的计算过程为Int8*Int8,从而以上乘累加操作得到的计算结果的精度为FP16。但在相关技术中,若设定输出数据精度统一为Int8,则需要保证L1实际输出的数据的精度为Int8,即需要将以上计算结果的FP16的精度截取为Int8,之后再从L1输出该精度Int8的数据。由于L2的权重精度为FP16,则在L2中进行计算时,又需要将截取后的Int8的精度补齐为FP16的精度,在这个过程中造成先截取掉的那些精度的损失,且引入了不必要的截取和补齐过程,消耗了更多运算量。
而本发明实施例中,先获取L2的权重精度,那么得知L1的待输出数据的原本的精度(计算结果的精度)与L2的权重精度相同(均为FP16),故根据L2的权重精度应将待输出数据的精度配置为FP16,即不会对待输出数据(计算结果)进行精度截取操作,而是直接将待输出数据输出得到FP16精度的输出数据,可减少数据转换中的精度损失,并减少不必要的运算。
对于L3来说,输入数据的精度为FP16,权重精度为FP16,在相关技术中,若设定输出数据精度统一为FP16,则L3输出数据的精度也应该是FP16(实际该FP16也是通过对计算结果进行精度截取得到 的)。但L4的权重精度为Int8,故其计算中所需的数据精度实际只有Int8,因此若L3的输出数据精度为FP16,则相当于在L3和L4之间增加了部分“无效的”的数据传输量。
而本发明实施例中,先获取L4的权重精度为Int8,那么得知L3的待输出数据的原本的精度高于L4的权重精度时,可将待输出数据的精度配置为Int8,即将L3的计算结果直接精度截取至Int8(而不是截取至FP16),从而使其实际输出数据的精度为Int8,故L3与L4之间只要传输该精度为Int8的数据即可。
相比于相关技术,本发明实施例的方式降低了L3的输出数据的精度,减少了L3和L4之间的数据传输量,也即减少了L3所在的处理核和L4所在的处理核中间的数据通信量,且对L4的计算精度不会产生影响,大大提升了性能。
图5为本发明实施例提供的另一种输出数据的精度配置方法的流程示意图,以神经网络为图像识别模型为例,假设图像识别模型为卷积神经网络模型(CNN),例如可包括卷积层、池化层和全连接层,分别对应神经网络设备中的第一处理核、第二处理核和第三处理核,该方法可包括:
步骤503、通过第一处理核获取待识别图像数据,并根据待识别图像数据和卷积层的权重参数计算卷积层的待输出特征图数据,获取池化层的权重精度,将卷积层的待输出特征图数据的精度配置成池化层的权重精度,得到卷积层的输出特征图数据,并输出至第二处理核。
步骤504、通过第二处理核根据卷积层的输出特征图数据和池化层的权重参数计算池化层的待输出特征向量数据,获取全连接层的权重精度,将池化层的待输出特征向量数据的精度配置成全连接层的权重精度,得到池化层的输出特征向量数据,并输出至第三处理核。
步骤505、通过第三处理核根据池化层的输出特征向量数据和全连接层的权重参数计算并输出图像识别结果。
本发明实施例提供的输出数据的精度配置方法,应用于图像识别场景时,卷积层和池化层对应的处理核在进行图像数据输出之前,先 根据下一层的权重精度来确定图像输出数据的精度,可在减少图像信息转换中的精度损失的同时,降低图像数据传输量,降低图像数据处理能耗,也即在保证计算精度的前提下,提高算力并降低功耗,使得承载图像识别网络模型的神经网络设备的性能得到有效提升,也能够提高图像识别模型的识别效率。
第二方面,本发明实施例提供一种输出数据的精度配置装置600。
图6为本发明实施例提供的一种输出数据的精度配置装置600的结构框图,该装置可由软件和/或硬件实现,一般可集成在神经网络设备中,可通过执行输出数据的精度配置方法来进行输出数据的精度配置。如图6所示,该装置至少包括以下权重精度获取模块602和精度配置模块603。
权重精度获取模块602,用于获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层。
精度配置模块603,用于至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
本发明实施例中提供的输出数据的精度配置装置600,应用于神经网络设备中,获取神经网络中的发送层的待输出数据的精度,在对待输出数据进行输出之前,先获取接收层的权重精度,接收层为发送层的下一层,并根据接收层的权重精度对待输出数据的精度进行配置。通过采用上述技术方案,可以灵活配置部署在神经网络设备中的神经网络中的一层或多层的输出数据的精度,进而优化神经网络设备的性能。
在一些实施例中,如图6所示,装置600还包括:待输出数据获取模块601,用于获取所述发送层的所述待输出数据的精度。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:根据所述待输出数据的精度以及所述接收层的权重精度,确定目标精度;将所述待输出数据的精度配置成目标精度。
在一些实施例中,所述获取所述发送层的所述待输出数据的精度,包括:获取所述发送层的输入数据的精度和所述发送层的权重精度;根据所述输入数据的精度和所述发送层的权重精度确定所述发送层的待输出数据的精度,所述待输出数据的精度大于或等于所述输入数据的精度和所述发送层的权重精度中较高的精度。
在一些实施例中,在所述接收层的权重精度低于所述待输出数据的精度时,所述目标精度低于所述待输出数据的精度,且不低于所述接收层的权重精度。
在一些实施例中,在所述接收层的权重精度不低于所述待输出数据的精度时,所述目标精度不低于所述待输出数据的精度,且不高于所述接收层的权重精度。
在一些实施例中,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:将所述接收层的权重精度确定为目标精度;将所述待输出数据的精度配置成目标精度。
在一些实施例中,在所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置之后,还包括:将配置后得到的输出数据输出至所述接收层。
在一些实施例中,所述神经网络设备基于众核架构实现。
第三方面,参照图7,本发明实施例提供了一种神经网络设备700,所述神经网络设备700包括至少一个处理核701,所述处理核701用于实现本发明实施例提供的输出数据的精度配置方法。
在一些实施例中,该神经网络设备700包括多个形成众核架构的处理核701。
本发明实施例的神经网络设备700可以采用众核架构,即其中包括多个处理核701,每个处理核701包含处理器并自带存储器,且不同处理核701之间可通过片上网络702(如2D Mesh)实现信息交互,从而每个处理核701可进行一定的计算,而通过多个处理核701可共同实现神经网络的计算。
例如,每个处理核701内可配置有神经网络的一个层。当然,若是一个处理核701内配置有神经网络的多个层,或仅配置有神经网络的一个层的一部分,或配置有神经网络的多个层的各一部分,也是可行的。
其中,众核架构的神经网络设备700的具体实现形式可以是多样的,例如设备可包括多个芯片(IC)的阵列,每个芯片内有一个处理核701或多个处理核701;或者,设备也可仅包括一个芯片,而芯片内有多个处理核701。
其中,当神经网络设备700包括多个芯片的阵列时,可同时高效的支持不同的神经网络,例如既能高效支持ANN算法,又能高效支持SNN算法。具体的,此时神经网络设备700的不同芯片中可以承载不同的神经网络模型,具体可以根据实际需求进行配置,扩展性好,能得到极大算力的芯片阵列,且算力效率不会下降,可支持多核重组特性,实现多任务并行处理。
第四方面,参照图8,本发明实施例提供了一种计算机可读存储介质800,其上存储有计算机程序,该计算机程序被处理核执行时实现本发明实施例提供的输出数据的精度配置方法。
上述实施例中提供的输出数据的精度配置装置、神经网络设备、计算机可读存储介质可执行本发明任意实施例所提供的输出数据的精度配置方法,具备执行该方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节,可参见本发明任意实施例所提供的输出数据的精度配置方法。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其它等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (13)

  1. 一种输出数据的精度配置方法,其特征在于,所述方法应用于神经网络设备中,所述方法包括:
    获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层;
    至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
  2. 根据权利要求1所述的方法,其特征在于,在所述获取神经网络中的接收层的权重精度前,还包括:
    获取所述发送层的所述待输出数据的精度。
  3. 根据权利要求2所述的方法,其特征在于,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:
    根据所述待输出数据的精度以及所述接收层的权重精度,确定目标精度;
    将所述待输出数据的精度配置成目标精度。
  4. 根据权利要求2所述的方法,其特征在于,所述获取所述发送层的所述待输出数据的精度,包括:
    获取所述发送层的输入数据的精度和所述发送层的权重精度;
    根据所述输入数据的精度和所述发送层的权重精度确定所述发送层的待输出数据的精度,所述待输出数据的精度大于或等于所述输入数据的精度和所述发送层的权重精度中较高的精度。
  5. 根据权利要求3或4所述的方法,其特征在于,
    在所述接收层的权重精度低于所述待输出数据的精度时,所述目标精度低于所述待输出数据的精度,且不低于所述接收层的权重精度。
  6. 根据权利要求3或4所述的方法,其特征在于,
    在所述接收层的权重精度不低于所述待输出数据的精度时,所述 目标精度不低于所述待输出数据的精度,且不高于所述接收层的权重精度。
  7. 根据权利要求1所述的方法,其特征在于,所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置,包括:
    将所述接收层的权重精度确定为目标精度;
    将所述待输出数据的精度配置成目标精度。
  8. 根据权利要求1所述的方法,其特征在于,在所述至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置之后,还包括:
    将配置后得到的输出数据输出至所述接收层。
  9. 根据权利要求1、2、3、4、7、8所述的方法,其特征在于,所述神经网络设备基于众核架构实现。
  10. 一种输出数据的精度配置装置,其特征在于,所述装置集成于神经网络设备中,所述装置包括:
    权重精度获取模块,用于获取神经网络中的接收层的权重精度,其中,所述接收层为发送层的下一层,所述发送层为神经网络中除最后一层外的任意一层;
    精度配置模块,用于至少根据所述接收层的权重精度对所述发送层的待输出数据的精度进行配置。
  11. 一种神经网络设备,其特征在于,所述神经网络设备包含至少一个处理核,所述处理核用于实现如权利要求1-9任一所述的方法。
  12. 根据权利要求11所述的设备,其特征在于,包括多个形成众核架构的处理核。
  13. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理核执行时实现如权利要求1-9任一项所述的方法。
PCT/CN2021/105173 2020-07-09 2021-07-08 数据精度配置方法和装置、神经网络设备、介质 WO2022007880A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010656745.9 2020-07-09
CN202010656745.9A CN111831354B (zh) 2020-07-09 2020-07-09 数据精度配置方法、装置、芯片、芯片阵列、设备及介质

Publications (1)

Publication Number Publication Date
WO2022007880A1 true WO2022007880A1 (zh) 2022-01-13

Family

ID=72900790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105173 WO2022007880A1 (zh) 2020-07-09 2021-07-08 数据精度配置方法和装置、神经网络设备、介质

Country Status (2)

Country Link
CN (1) CN111831354B (zh)
WO (1) WO2022007880A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831354B (zh) * 2020-07-09 2023-05-16 北京灵汐科技有限公司 数据精度配置方法、装置、芯片、芯片阵列、设备及介质
CN113221896A (zh) * 2021-05-31 2021-08-06 北京灵汐科技有限公司 目标检测方法、装置、神经形态器件及介质
CN115600657A (zh) * 2021-07-09 2023-01-13 中科寒武纪科技股份有限公司(Cn) 一种处理装置、设备、方法及其相关产品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171323A (zh) * 2016-12-28 2018-06-15 上海寒武纪信息科技有限公司 一种人工神经网络计算装置和方法
CN109800877A (zh) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 神经网络的参数调整方法、装置及设备
CN109902745A (zh) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 一种基于cnn的低精度训练与8位整型量化推理方法
CN110503181A (zh) * 2018-05-18 2019-11-26 百度在线网络技术(北京)有限公司 用于生成多层神经网络的方法和装置
CN110738315A (zh) * 2018-07-18 2020-01-31 华为技术有限公司 一种神经网络精度调整方法及装置
US20200202199A1 (en) * 2018-12-19 2020-06-25 Samsung Electronics Co., Ltd. Neural network processing method and apparatus based on nested bit representation
CN111831354A (zh) * 2020-07-09 2020-10-27 北京灵汐科技有限公司 数据精度配置方法、装置、芯片、芯片阵列、设备及介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (zh) * 2016-02-18 2016-07-13 清华大学 卷积神经网络的逐层变精度定点化方法及装置
WO2018058426A1 (zh) * 2016-09-29 2018-04-05 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN108345939B (zh) * 2017-01-25 2022-05-24 微软技术许可有限责任公司 基于定点运算的神经网络
CN108229648B (zh) * 2017-08-31 2020-10-09 深圳市商汤科技有限公司 匹配存储器中数据位宽的卷积计算方法和装置、设备、介质
US20190102671A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Inner product convolutional neural network accelerator
CN108334945B (zh) * 2018-01-30 2020-12-25 中国科学院自动化研究所 深度神经网络的加速与压缩方法及装置
EP3543917B1 (en) * 2018-03-19 2024-01-03 SRI International Inc. Dynamic adaptation of deep neural networks
CN109146057B (zh) * 2018-06-26 2020-12-08 杭州雄迈集成电路技术股份有限公司 一种基于查表计算的高精度的神经网络工程化方法
CN109740508B (zh) * 2018-12-29 2021-07-23 北京灵汐科技有限公司 一种基于神经网络系统的图像处理方法及神经网络系统
US20200210840A1 (en) * 2018-12-31 2020-07-02 Microsoft Technology Licensing, Llc Adjusting precision and topology parameters for neural network training based on a performance metric
KR20200086581A (ko) * 2019-01-09 2020-07-17 삼성전자주식회사 뉴럴 네트워크 양자화를 위한 방법 및 장치

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171323A (zh) * 2016-12-28 2018-06-15 上海寒武纪信息科技有限公司 一种人工神经网络计算装置和方法
CN110503181A (zh) * 2018-05-18 2019-11-26 百度在线网络技术(北京)有限公司 用于生成多层神经网络的方法和装置
CN110738315A (zh) * 2018-07-18 2020-01-31 华为技术有限公司 一种神经网络精度调整方法及装置
US20200202199A1 (en) * 2018-12-19 2020-06-25 Samsung Electronics Co., Ltd. Neural network processing method and apparatus based on nested bit representation
CN109800877A (zh) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 神经网络的参数调整方法、装置及设备
CN109902745A (zh) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 一种基于cnn的低精度训练与8位整型量化推理方法
CN111831354A (zh) * 2020-07-09 2020-10-27 北京灵汐科技有限公司 数据精度配置方法、装置、芯片、芯片阵列、设备及介质

Also Published As

Publication number Publication date
CN111831354B (zh) 2023-05-16
CN111831354A (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
WO2022007880A1 (zh) 数据精度配置方法和装置、神经网络设备、介质
CN108416436B (zh) 使用多核心处理模块进行神经网络划分的方法及其系统
CN107301456B (zh) 基于向量处理器的深度神经网络多核加速实现方法
CN109447241B (zh) 一种面向物联网领域的动态可重构卷积神经网络加速器架构
CN110389910A (zh) 用于管理级联神经网络中的存储器的方法和安排
CN110991630A (zh) 一种面向边缘计算的卷积神经网络处理器
CN115828831B (zh) 基于深度强化学习的多芯粒芯片算子放置策略生成方法
US20210103820A1 (en) Pipelined backpropagation with minibatch emulation
CN111831355B (zh) 权重精度配置方法、装置、设备及存储介质
TW201807622A (zh) 多層人造神經網路
CN111210019A (zh) 一种基于软硬件协同加速的神经网络推断方法
CN111831359B (zh) 权重精度配置方法、装置、设备及存储介质
CN108491924B (zh) 一种面向人工智能计算的神经网络数据串行流水处理装置
CN117501245A (zh) 神经网络模型训练方法和装置、数据处理方法和装置
CN111860773A (zh) 处理装置和用于信息处理的方法
CN112114942A (zh) 一种基于众核处理器的流式数据处理方法及计算设备
CN113837922A (zh) 计算装置、数据处理方法及相关产品
CN116185937B (zh) 基于众核处理器多层互联架构的二元运算访存优化方法及装置
WO2023098256A1 (zh) 神经网络运算方法、装置、芯片、电子设备和存储介质
CN108090865B (zh) 光学卫星遥感影像在轨实时流式处理方法及系统
CN116151315A (zh) 一种面向晶上系统的注意力网络调度优化方法及装置
CN112559197A (zh) 基于异构众核处理器的卷积计算数据重用方法
CN114897133A (zh) 一种通用可配置的Transformer硬件加速器及其实现方法
CN108564524A (zh) 一种视觉图像的卷积计算优化方法
CN108388943A (zh) 一种适用于神经网络的池化装置及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21838741

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21838741

Country of ref document: EP

Kind code of ref document: A1