CN111831356B - Weight precision configuration method, device, equipment and storage medium - Google Patents

Weight precision configuration method, device, equipment and storage medium Download PDF

Info

Publication number
CN111831356B
CN111831356B CN202010659069.0A CN202010659069A CN111831356B CN 111831356 B CN111831356 B CN 111831356B CN 202010659069 A CN202010659069 A CN 202010659069A CN 111831356 B CN111831356 B CN 111831356B
Authority
CN
China
Prior art keywords
layer
precision
neural network
weight
target layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010659069.0A
Other languages
Chinese (zh)
Other versions
CN111831356A (en
Inventor
祝夭龙
何伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202010659069.0A priority Critical patent/CN111831356B/en
Publication of CN111831356A publication Critical patent/CN111831356A/en
Priority to US18/015,065 priority patent/US11797850B2/en
Priority to PCT/CN2021/105172 priority patent/WO2022007879A1/en
Application granted granted Critical
Publication of CN111831356B publication Critical patent/CN111831356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium. The method comprises the following steps: determining a current target layer, wherein all layers are sequenced according to the influence degree on the recognition rate, the layer with the high influence degree is preferentially determined as the target layer, the weight precision corresponding to the current target layer is reduced to the preset lowest precision, then the weight precision corresponding to the current target layer is increased, if the current recognition rate of the neural network is greater than the preset threshold value, the weight precision corresponding to the current target layer is locked as the weight precision before the increase, and the current target layer is determined again under the condition that the switching condition of the target layer is met. By adopting the technical scheme, the upper limit of the weight precision of each layer can be reasonably controlled under the condition of considering the recognition rate of the neural network, the resource utilization rate of the artificial intelligent chip bearing the neural network is improved, the performance of the chip is improved, and the power consumption of the chip is reduced.

Description

Weight precision configuration method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium.
Background
With the explosive development of big data information networks and intelligent mobile devices, massive unstructured information is generated, accompanied by a rapid increase in the high-efficiency processing demand for the information. In recent years, the deep learning technology has been rapidly developed, and high accuracy has been achieved in many fields such as image recognition, speech recognition, and natural language processing. However, most of the deep learning research nowadays is still implemented based on the traditional von neumann computer, which is not only high in energy consumption and low in efficiency when processing large-scale complex problems due to the separation of a processor and a memory, but also high in software programming complexity when processing non-formalized problems due to the characteristics of numerical calculation, and even cannot be implemented.
With the development of brain science, because the brain has characteristics such as super low-power consumption and high fault-tolerance than traditional von neumann computer, and has showing the advantage in the aspect of handling unstructured information and intelligent task, it has become a new development direction to draw the reference to the computing mode of brain to establish novel artificial intelligence system and artificial intelligence chip, consequently, the artificial intelligence technique of drawing the reference to the brain development comes into force. The neural network in the artificial intelligence technology is composed of a large number of neurons, the neural network can simulate the self-adaptive learning process of the brain by defining basic learning rules through distributed storage and parallel cooperative processing of information, explicit programming is not needed, and the neural network has advantages in processing some informatization problems. Artificial intelligence techniques can be implemented using large-scale integrated analog, digital, or mixed-analog circuits and software systems, i.e., based on neuromorphic devices.
At present, a deep learning algorithm can work under different data precisions, better performance (such as accuracy or recognition rate) can be obtained at high precision, but after the deep learning algorithm is applied to an artificial intelligent chip, storage cost and calculation cost are high, and performance loss at a certain degree can be replaced by remarkable saving of storage and calculation at low precision, so that the chip has high power consumption and utility. In a conventional artificial intelligence chip, due to different requirements for computational accuracy, a processing chip also needs to provide storage support for multiple data accuracies, including integer (Int) and floating-point (FP), such as 8-bit integer (Int 8), 16-bit floating-point (FP 16), 32-bit floating-point (FP 32), and 64-bit floating-point (FP), but the weight accuracies of the layers of a neural network carried in a brain-like chip are the same, so that a weight accuracy configuration scheme in the artificial intelligence chip is not flexible enough and needs to be improved.
Disclosure of Invention
The embodiment of the invention provides a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium, which can optimize the existing weight precision configuration scheme.
In a first aspect, an embodiment of the present invention provides a method for configuring weight precision, including:
determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer;
reducing the weight precision corresponding to the current target layer to a preset minimum precision;
increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than a preset threshold value, and if so, locking the weight precision corresponding to the current target layer to the weight precision before the increase;
and under the condition that the target layer switching condition is met, re-determining the current target layer.
In a second aspect, an embodiment of the present invention provides a weight precision configuration apparatus, including:
the target layer determining module is used for determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer;
the weight precision reduction module is used for reducing the weight precision corresponding to the current target layer to a preset lowest precision;
the weight precision increasing module is used for increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than a preset threshold value or not, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase;
and the target layer switching module is used for re-determining the current target layer under the condition of meeting the target layer switching condition.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the weight precision configuration method according to the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the weight precision configuration method provided by the embodiment of the present invention.
According to the weight precision configuration scheme provided by the embodiment of the invention, all layers in the neural network are sorted according to the influence degrees on the recognition rate, the layer with the high influence degree is preferentially determined as the target layer, the weight precision corresponding to the current target layer is reduced to the preset lowest precision, then the weight precision corresponding to the current target layer is increased, if the current recognition rate of the neural network is greater than the preset threshold value, the weight precision corresponding to the current target layer is locked as the weight precision before the increase, and the current target layer is re-determined under the condition that the switching condition of the target layer is met. By adopting the technical scheme, all layers in the neural network are sequenced according to the influence degree on the recognition rate, the weight precision of the current target layer is sequentially tried to be reduced to the lowest and then increased, the upper limit of the weight precision of each layer is reasonably controlled under the condition of considering the recognition rate of the neural network, the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the performance of the chip is improved, and the power consumption of the chip is reduced.
Drawings
Fig. 1 is a schematic flowchart of a weight precision configuration method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a prior art scheme for configuring the accuracy of output data;
FIG. 3 is a schematic diagram of a precision configuration scheme for output data according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of another method for configuring weight precision according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of another method for configuring weight precision according to an embodiment of the present invention;
fig. 7 is a block diagram of a weight precision configuration apparatus according to an embodiment of the present invention;
fig. 8 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
It should be noted that the terms "first", "second", and the like in the embodiments of the present invention are only used for distinguishing different apparatuses, modules, units, or other objects, and are not used for limiting the order or interdependence of the functions performed by these apparatuses, modules, units, or other objects.
For a better understanding of embodiments of the present invention, the related art will be described below.
Artificial intelligence generally refers to the basic law of information processing in the brain, and makes essential changes to the existing computing system and system at multiple levels of hardware implementation, software algorithm and the like, so as to realize great improvements in many aspects such as computing energy consumption, computing power, computing efficiency and the like, and is a cross-technical field fusing the fields of brain science and computer science, information science, artificial intelligence and the like. The artificial intelligence chip generally refers to a non-von neumann chip, such as a pulse neural network chip, a memristor, a memcapacitor, a meminductor and the like.
The artificial intelligence chip in the embodiment of the present invention may include a plurality of processing cores, each of the processing cores may include a processor and a memory area, the calculation data may be operated locally, and each of the processing cores may correspond to one layer of the neural network, that is, the neural network may be deployed or mapped onto the corresponding processing core in units of layers. The Neural Network in the embodiment of the present invention may include an Artificial Neural Network (ANN), and may also include a Spiking Neural Network (SNN) and other types of Neural networks. The specific type of the neural network is not limited, and for example, the neural network may be an acoustic model, a voice recognition model, an image recognition model, and the like, and may be applied to a data center, a security field, an intelligent medical field, an automatic driving field, an intelligent transportation field, an intelligent home field, and other related fields. The technical scheme provided by the embodiment of the invention does not improve the neural network algorithm, is an improvement on a control mode or an application mode of a hardware platform for realizing the neural network, belongs to a neuromorphic circuit and a system thereof, and is also called neuromorphic engineering (neuromorphic engineering).
In the prior art, the weight precision of each layer of the neural network carried in the artificial intelligence chip is the same. If the weight precision of all layers is configured to be lower Int4, in this case, in order to ensure the recognition rate, not only parameter adjustment is difficult, which leads to a significant increase in training time, but also a significant loss of precision is often caused. If the weight precision of all layers is configured to be FP32 or higher, at this time, the calculation precision can meet the requirement, and the recognition rate is high, but the model of the neural network is generally large, which causes low resource utilization rate of the artificial intelligence chip, high power consumption, and influence on the chip performance.
In the embodiment of the invention, the limiting condition that the weight precision of each layer in the neural network is the same in the prior art is abandoned, and different weight precisions can be configured for each layer, namely, the mixed precision is adopted, so that the relation between the storage capacity and the calculation energy consumption and the recognition rate (or accuracy rate) of the neural network is well balanced. The weight precision is configured based on the idea of mixed precision, and a specific configuration scheme is provided.
Fig. 1 is a schematic flowchart of a weight precision configuration method according to an embodiment of the present invention, where the method may be executed by a weight precision configuration apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. As shown in fig. 1, the method includes:
step 101, determining a current target layer in a neural network, wherein all layers in the neural network are sorted according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer.
In the embodiment of the present invention, a specific structure of the neural network is not limited, and for example, the number of neuron layers included in the neural network may be any number of layers greater than two. Different layers in the neural network may have different degrees of influence on the recognition rate of the network, and factors that affect the network may be many, such as the number of weight parameters, the values of the weight parameters (weight values), and the weight precision (precision of the weight values). The influence degree of each layer in the neural network on the recognition rate can be respectively evaluated in advance, and the layers are sorted according to a certain order (for example, the influence degree is from high to low). In this step, the layer with the highest influence degree may be determined as the current target layer, and when the target layer needs to be switched, the layer with the second highest influence degree may be determined as the new current target layer.
And 102, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
For example, the initial weight precision of all layers in the neural network may be set according to actual requirements, and may be the same or different. The preset minimum precision can be set according to actual requirements, and can be determined according to hardware configuration of an artificial intelligence chip. The advantage of such an arrangement is that the neural network may be provided by a third party with application requirements before needing to be deployed to the artificial intelligence chip, and the third party does not consider the specific situation of the artificial intelligence chip when designing the neural network, so that the weight precision of each layer may be higher, and therefore before performing weight precision configuration, the weight precision may be reduced to a preset minimum weight precision matched with the artificial intelligence chip, and then an attempt is made to gradually increase.
And 103, increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than a preset threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase.
For example, when the weight precision corresponding to the current target layer is increased, the increasing amplitude is not limited. In addition, the magnitude of each rise may be the same or different. The raising amplitude can be measured by the precision grade, the precision grade is used for representing the data precision, the higher the precision is, the higher the corresponding precision grade is, and the precision values corresponding to different precision grades can be set according to actual requirements. Illustratively, the increase may be in the order of Int4, int8, FP16, and FP32, one level of accuracy at a time, such as from Int4 to Int8. The advantage of increasing one precision level each time is that the precision of the weight can be determined more accurately, that is, the precision of the configuration is higher, if the current recognition rate is increased by two or more layers each time, when the current recognition rate is greater than the preset threshold, the locked precision of the weight is different from the current precision after the increase by two or more precision levels, and the recognition rate between the two, which may have a certain weight precision or a certain weight precision corresponding to the certain weight precision, may be smaller than the preset threshold.
Illustratively, when the neural network is deployed on the artificial intelligence chip, the neural network is deployed or mapped onto the corresponding processing core in units of layers, and the current target layer is mapped into the corresponding processing core, so that the weight precision corresponding to the current target layer can be understood as the core precision of the processing core corresponding to the current target layer, that is, the scheme of the embodiment of the present invention can be understood as configuring the core precision of the processing core in the artificial intelligence chip.
For example, the recognition rate of the neural network can be used to measure the performance of the neural network. For example, a preset number of samples may be used to test the neural network to obtain the current recognition rate. The preset threshold may be set according to actual use requirements such as an application scenario of the neural network, which may be understood as the highest recognition rate that can be achieved under the condition of considering the chip performance tolerance, and a specific value is not limited, and may be, for example, 0.95. If the weight precision of the current target layer is increased, the current recognition rate of the neural network is larger than the preset threshold, which indicates that the increase is not appropriate and may cause a large influence on the performance of the chip, so that the weight precision corresponding to the current target layer can be locked to the weight precision before the increase. For example, FP16 before the lifting and FP32 after the lifting can lock the weight precision corresponding to the current target layer as FP16.
And step 104, re-determining the current target layer under the condition that the target layer switching condition is met.
For example, it may be tried to decrease the weight precision to the lowest level and then increase the weight precision for each layer in the neural network, and it may be decided whether to increase the weight precision of the next target layer according to the target layer switching condition. After the weight precision corresponding to all layers is locked, the weight precision configuration of the neural network is considered to be finished, and the weight precision of the neural network at the moment is reasonably configured, so that the requirement of the recognition rate can be met, and the resource utilization rate in the artificial intelligent chip bearing the neural network can be improved. Optionally, the weight precision may also be adjusted by attempting to adjust a part of layers in the neural network, so that the resource utilization rate in the artificial intelligence chip carrying the neural network is improved to a certain extent, and the configuration efficiency of the weight precision is ensured. The specific number of the partial layers can be set according to actual requirements, such as the product of the number of all the layers and a preset proportion.
According to the weight precision configuration method provided by the embodiment of the invention, all layers in the neural network are sorted according to the influence degrees on the recognition rate, the layer with the high influence degree is preferentially determined as the target layer, the weight precision corresponding to the current target layer is reduced to the preset lowest precision, then the weight precision corresponding to the current target layer is increased, if the current recognition rate of the neural network is greater than the preset threshold value, the weight precision corresponding to the current target layer is locked as the weight precision before the increase, and the current target layer is re-determined under the condition that the switching condition of the target layer is met. By adopting the technical scheme, all layers in the neural network are sequenced according to the influence degree on the recognition rate, the weight precision of the current target layer is sequentially tried to be reduced to the lowest and then increased, and the upper limit of the weight precision of each layer is reasonably controlled under the condition of considering the recognition rate of the neural network, so that the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the performance of the chip is improved, and the power consumption of the chip is reduced.
Optionally, after determining whether the current recognition rate of the neural network is greater than a preset threshold, the method may further include: and if the weight precision is smaller than or equal to the preset weight precision, locking the weight precision corresponding to the current target layer to the increased weight precision. The method has the advantages that after the weight precision of the current target layer is increased once, the requirement of the recognition rate can be met to a certain extent, the model area of the neural network is effectively controlled, and in order to improve the efficiency of weight precision configuration, the increased weight precision can be locked, and the weight precision of other layers is continuously increased. Therefore, the weight precision corresponding to all layers of the neural network is locked, which is equivalent to that the weight precision of all layers in the neural network is tried to be increased, and at the moment, the weight precision configuration of the neural network is finished, so that the weight precision of the neural network can be rapidly configured under the condition of ensuring the identification rate of the neural network, the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the chip performance is improved, and the chip power consumption is reduced.
For example, the way of locking the precision of the weight may be to rewrite a bit number flag bit of the current target layer or rewrite a name of a calling operator corresponding to the current target layer.
In some embodiments, after determining whether the current recognition rate of the neural network is greater than a preset threshold, the method further includes: if the current identification rate is less than or equal to the preset threshold value, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is greater than the preset threshold value; and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a preset threshold value; the preferentially determining the layer with the high influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer. The advantage of this is that the efficiency of weight accuracy configuration can be improved. When the current identification rate of the neural network is determined to be smaller than or equal to the preset threshold, it is indicated that the weight precision of the current target layer still has a raised space, so that the weight precision of the current target layer can be continuously tried to be raised, whether the current identification rate is raised above the preset threshold or not is continuously judged, and until the current identification rate is larger than the preset threshold, it is indicated that the weight precision of the current target layer cannot be raised any more, so that the target layer can be switched, and the weight precision of the next layer is tried to be raised.
In some embodiments, a plurality of rounds of raising operations are performed on the weight precisions corresponding to all layers in the neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised at most once; after judging whether the current recognition rate of the neural network is greater than a preset threshold value, the method further comprises the following steps: if the weight precision is less than or equal to the weight precision, temporarily storing the raised weight precision; correspondingly, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset minimum precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision. For the current target layer, if the current target layer is determined as the target layer for the first time, the corresponding weight precision is not adjusted, the corresponding weight precision is reduced to the preset lowest precision, and then the weight precision is increased again; if it is not determined as the target layer for the first time, the lifting operation has already been performed once, and in the present round, the lifting operation may be performed once again on the basis of the lifted weight accuracy temporarily stored in the previous round. This has the advantage that the accuracy of the weighting of the layers can be increased uniformly. For example, there are 4 layers in the neural network, which are L1, L2, L3, and L4, respectively, and the layers are sorted according to the degree of influence on the recognition rate, from the highest to the lowest influence, and are L1, L3, L2, and L4 in sequence, so in each round of raising operation, L1 is determined as a target layer, that is, the current target layer is L1, the weight precision of L1 is raised first, then the target layer is switched so that the current target layer is L3, the weight precision of L3 is raised, and then the weight precision of L2 and L4 is raised in sequence.
In some embodiments, said re-determining the current target layer comprises: and re-determining the current target layer until the weight precision corresponding to all the layers is locked. The advantage of setting up like this is, try the adjustment of weight precision to all layers in neural network, make the configuration result more reasonable, can promote chip performance better.
In some embodiments, after determining whether the current recognition rate of the neural network is greater than a preset threshold, the method further includes: if the weight parameter value is smaller than or equal to the weight parameter value, the neural network is trained to adjust the weight parameter value of the current target layer, wherein the training target is to improve the recognition rate of the neural network. The method has the advantages that after the weight precision is increased, the general recognition rate is increased, the neural network is trained, the weight parameter value of the current target layer is adjusted, the recognition rate can be further improved, the performance of the neural network is optimized, the next time the weight precision is increased, the next time the weight precision is closer to the preset threshold value, and the weight precision configuration efficiency is improved.
In some embodiments, the training the neural network comprises training the neural network on an artificial intelligence chip. The method has the advantages that the neural network in the embodiment of the invention can be mapped to the artificial intelligence chip for application and trained on the artificial intelligence chip, namely, the neural network is mapped to the artificial intelligence chip in advance before actual application, so that the training process is more consistent with the actual application scene, and the neural network is trained more accurately and efficiently.
In some embodiments, in training the neural network, the method comprises: acquiring the precision of data to be output of a first layer in the neural network, wherein the first layer comprises any one or more layers except the last layer in the neural network; acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer; and configuring the precision of the data to be output according to the weight precision of the second layer. The advantage of setting up like this is, can dispose the precision of the output data of one or more layers in the neural network of artificial intelligence chip deployment in a flexible way, and then optimize the performance of artificial intelligence chip.
At present, the neural network of artificial intelligence usually comprises several neurons as one layer, and each layer usually corresponds to one processing core in the artificial intelligence chip. The core calculation of the neural network is a matrix vector multiplication operation, when data is input into one layer of the neural network, the calculation precision is generally the product of the data precision and the weight precision (namely, the precision of a weight value), and the precision of a calculation result (namely, the output data of a processing core corresponding to the current layer) is determined by referring to the higher precision of the data precision and the weight precision. Fig. 2 is a schematic diagram of a precision configuration scheme of output data in the prior art, where in the prior art, the precision of weights of each layer of a neural network carried in an artificial intelligence chip is the same, as shown in fig. 2, for convenience of description, only four layers, L1, L2, L3, and L4, are shown in the neural network. The precision (data precision) of the input data of L1 is FP32 (32-bit floating point), the weight precision of L1 is FP32, and the precision obtained after the multiply-accumulate operation is FP32. In the embodiment of the invention, the precision of the calculation result is determined according to the weight precision of the next layer instead of the higher precision of the data precision and the weight precision.
In the embodiment of the present invention, the first layer is not necessarily the first layer in the neural network, and may be any layer other than the last layer. If the processing core corresponding to the first layer is referred to as the first processing core, it may be understood that the first processing core performs precision of acquiring data to be output of the first layer in the neural network, and acquires weight precision of the second layer, and configures the precision of the data to be output of the first layer according to the weight precision of the second layer, and any one processing core except the processing core corresponding to the last layer may become the first processing core. Illustratively, data calculation is performed by a processor in a first processing core corresponding to the first layer, for example, to obtain data to be output according to input data of the first layer and a weight parameter (such as a weight matrix, etc.) of the first layer, and generally, the precision of the data to be output is greater than or equal to the higher precision of the input data and the precision of the weight. If the precision and the weight precision of the input data are low (such as Int2, int4, or Int 8), after the multiply-accumulate operation, the number of bits may be insufficient (for example, the requirement on the hardware configuration of the corresponding processing core cannot be met), and the precision needs to be improved, the precision of the data to be output is usually increased to be high (for example, to Int8, int16, or Int16, respectively), and the lower the higher the precision and the weight precision of the input data is, the more precision levels need to be improved; conversely, if the input data precision and weight precision are themselves relatively high (such as FP16, FP32, or FP 64), the precision of the data to be output may not increase, or may increase relatively little (e.g., from FP16 to FP 32), because the precision after the multiply-accumulate operation is sufficiently high.
Optionally, obtaining the accuracy of the data to be output of the first layer in the neural network may include: acquiring the precision of input data of a first layer in a neural network and the weight precision of the first layer; and determining the precision of the data to be output of the first layer according to the precision of the input data and the weight precision of the first layer, wherein the precision of the data to be output is greater than or equal to the higher precision of the input data and the weight precision of the first layer.
In the embodiment of the present invention, the weight accuracies of different layers may be different, and a specific manner of obtaining the weight accuracy of the second layer is not limited. For example, the weight precision of the second layer may be stored in a storage area in the first processing core in a compiling stage of the chip, and after the data to be output of the first layer is acquired, the weight precision of the second layer is read from the storage area; for another example, assuming that the processing core corresponding to the second layer is the second processing core, the storage area in the second processing core may store the weight precision of the second layer, and the first processing core may obtain the weight precision of the second layer from the second processing core by means of inter-core communication.
In the embodiment of the present invention, the precision of the data to be output of the first layer is configured with reference to the weight precision of the second layer, and a specific reference manner and a configuration manner are not limited. For example, the precision of the data to be output may be configured to be lower than the precision of the weight of the second layer, or the precision of the data to be output may be configured to be higher than the precision of the weight of the second layer, to obtain the precision of the output data, and the precision level of the difference between the precision of the weight of the second layer and the precision of the output data may be a first preset precision level difference. For example, int8 also exists between accuracies Int4 and FP16, the accuracy level of the phase difference may be 2, and the accuracy level of the phase difference between Int4 and Int8 may be 1. Assuming that the weight precision of the second layer is FP16 and the first preset precision level difference is 2, if the precision of the data to be output is configured to be lower than the weight precision of the second layer, the precision of the data to be output is configured to be Int4.
In some embodiments, the configuring the precision of the data to be output according to the precision of the weight of the second layer includes: when the weight precision of the second layer is lower than the precision of the data to be output, determining target precision according to the weight precision of the second layer, wherein the target precision is lower than the precision of the data to be output; and configuring the precision of the data to be output to be target precision. Optionally, the target precision is equal to or higher than the weight precision of the second layer, which is equivalent to performing an intercept operation on the precision of the data to be output according to the weight precision of the second layer, so that the precision of the data to be output is reduced, thereby reducing the data transmission amount, and when performing data calculation on the second layer, the calculation amount can also be reduced, thereby reducing the energy consumption caused by data processing.
In some embodiments, the determining a target precision according to the precision of the weight of the second layer comprises: determining the weight precision of the second layer as a target precision. This has the advantage of being equivalent to truncating the accuracy of the data to be output to an accuracy consistent with the accuracy of the weights of the second layer. The data transmission quantity can be further reduced, the energy consumption brought by data processing can be further reduced, and the chip computing power can be improved. Optionally, the weight precision of the second layer and the precision of the data to be output of the first layer may not be determined, and the weight precision of the second layer is directly determined as the target precision.
In some embodiments, may include: judging whether the weight precision of the second layer is lower than the precision of the data to be output of the first layer, if so, determining the weight precision of the second layer as target precision, and configuring the precision of the data to be output of the first layer as the target precision to obtain output data; otherwise, the precision of the data to be output of the first layer is kept unchanged or the precision of the data to be output of the first layer is configured to be the weight precision of the second layer, and the output data is obtained. Wherein maintaining the accuracy of the data to be output of the first layer unchanged can reduce the transmission amount between the first layer and the second layer.
In some embodiments, after the configuring the precision of the data to be output according to the precision of the weight of the second layer, the method further includes: and outputting the configured output data to the processing core corresponding to the second layer. The advantage of this arrangement is that the output data is sent to the processing core corresponding to the second layer by means of inter-core communication, so that the processing core corresponding to the second layer performs the correlation calculation of the second layer.
In some embodiments, the artificial intelligence chip is implemented based on a many-core architecture, the many-core architecture can have a multi-core recombination characteristic, the cores do not have a master-slave division, tasks can be flexibly configured by software, different tasks are simultaneously configured in different cores, multi-task parallel processing is realized, a series of cores form an array to complete calculation of a neural network, various neural network algorithms can be efficiently supported, and the performance of the chip is improved. Illustratively, the artificial intelligence chip can adopt a 2D Mesh network-on-chip structure for communication interconnection between cores, and communication between the chip and the outside can be realized through a high-speed serial port.
Fig. 3 is a schematic diagram of an accuracy configuration scheme of output data according to an embodiment of the present invention, and as shown in fig. 3, for convenience of description, only four layers, namely L1, L2, L3, and L4, in a neural network are shown.
For L1, the precision of the input data is Int8, the weighting precision of L1 is Int8, and then the precision obtained after the multiply-accumulate operation is Int8, but the precision may be saturated during the multiply-accumulate operation, resulting in lost information. In the prior art, the calculation result is determined by referring to the higher precision of the data precision and the weight precision, and since the weight precision of L2 is FP16, the precision of Int8 after interception needs to be complemented and then output, which causes the loss of the precision that is intercepted first in the process. In the embodiment of the invention, the weight precision of the L2 is obtained first, so that the precision of the data to be output of the L1 is known to be the same as the weight precision of the L2, the precision interception operation is not carried out, and the precision loss in data conversion can be reduced.
For L3, the precision of the input data is FP16, the precision of the weight is FP16, and in the prior art, the precision of the output data should also be FP16. In the embodiment of the present invention, the weight accuracy Int8 of L4 is obtained first, so that it is known that the accuracy of the data to be output of L1 is higher than the weight accuracy of L2, and the accuracy of the data to be output can be configured to be Int8, which further reduces the accuracy of the output data, reduces the data transmission amount between the L3 layer and the L4 layer, i.e., reduces the data communication amount between the processing core where the L3 layer is located and the processing core where the L4 layer is located, and does not affect the calculation accuracy of the L4 layer, thereby greatly improving the chip performance.
In some embodiments, all layers in the neural network are ranked by degree of influence on recognition rate by: calculating the initial recognition rate of the neural network; for each layer in the neural network, reducing the weight precision of the current layer from a first precision to a second precision, and calculating a reduction value of the recognition rate of the neural network relative to the initial recognition rate; and sequencing all layers according to the descending value, wherein the larger the descending value is, the higher the influence degree on the recognition rate is. The advantage of this arrangement is that the influence of the different layers on the recognition rate can be evaluated quickly and accurately. The first precision and the second precision can be set according to actual requirements, the first precision can be the initial precision of the neural network, for example, and the precision level of the difference between the first precision and the second precision is not limited. For example, the first precision may be FP32 and the second precision may be FP16.
In some embodiments, if there are at least two layers with the same degradation value, the at least two layers are sorted according to the distance from the input layer of the neural network, wherein the smaller the distance, the higher the influence on the recognition rate. The advantage of this arrangement is that the layers can be ordered more reasonably.
Fig. 4 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention, as shown in fig. 4, the method includes:
step 401, determining a current target layer in the neural network.
And all layers in the neural network are sorted according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer in the layers with the corresponding weight precision not locked.
Illustratively, before this step, the method may further include: calculating the initial recognition rate of the neural network, reducing the weight precision of the current layer from the first precision to the second precision for each layer in the neural network, calculating the descending value of the recognition rate of the neural network relative to the initial recognition rate, and sequencing all the layers according to the descending value to obtain a sequencing result, wherein the larger the descending value is, the higher the influence degree on the recognition rate is.
In this step, the current target layer is determined according to the sorting result. When the step is executed for the first time, the layer with the highest descending value is determined as the current target layer, namely, the layer with the largest influence on the identification rate is determined as the current target layer. When the step is executed again, a new current target layer is switched according to the sequencing result, and if the weight precision corresponding to a certain layer is locked, the current target layer cannot be formed again, namely the current target layer cannot be formed.
And step 402, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
And step 403, increasing the weight precision corresponding to the current target layer.
For example, the precision of the weight corresponding to the current target layer may be increased by one precision level. Each increase hereinafter may be by one level of accuracy.
Step 404, judging whether the current recognition rate of the neural network is greater than a preset threshold value, if so, executing step 405; otherwise, step 407 is executed.
And 405, locking the weight precision corresponding to the current target layer as the weight precision before the current rise.
Step 406, judging whether the weight precision corresponding to all layers is locked, if so, ending the process; otherwise, return to execute step 401.
Illustratively, the precision of the weight of the lock is marked in the bit flag of the current target layer or the name of the calling operator.
Step 407, training the neural network to adjust the weight parameter value of the current target layer, and returning to execute step 403.
Wherein the training target is to improve the recognition rate of the neural network.
Optionally, training of the neural network is performed on the artificial intelligence chip, and the training process may refer to the above related contents, which are not described herein again.
In the embodiment of the invention, all layers in the neural network are sequenced according to the influence degree on the recognition rate, and the weight precision of the current target layer is sequentially tried to be firstly reduced to the lowest and then gradually increased until the recognition rate of the neural network is greater than the preset threshold value, so that the configuration of the weight precision can be quickly realized, the model area of the neural network is effectively controlled under the condition of considering the recognition rate of the neural network, the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the chip performance is improved, and the chip power consumption is reduced.
Fig. 5 is a schematic flow chart of another method for configuring weight precision according to an embodiment of the present invention, where multiple rounds of raising operations are performed on the weight precision corresponding to all layers in a neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised once, as shown in fig. 5, the method includes:
step 501, determining a current target layer in the neural network.
And all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the low influence degree is preferentially determined as the target layer.
Illustratively, before this step, the method may further include: calculating the initial recognition rate of the neural network, reducing the weight precision of the current layer from the first precision to the second precision for each layer in the neural network, calculating the descending value of the recognition rate of the neural network relative to the initial recognition rate, and sequencing all the layers according to the descending value to obtain a sequencing result, wherein the larger the descending value is, the higher the influence degree on the recognition rate is.
In this step, the current target layer is determined according to the sorting result. When the step is executed for the first time, the layer with the largest descending value is determined as the current target layer, namely, the layer with the largest influence on the identification rate is determined as the current target layer. When the step is executed again, a new current target layer is switched according to the sequencing result. Optionally, when the determined weight precision of the current target layer is locked, the layer may be skipped, and the next layer is determined as the current target layer according to the sorting result.
Step 502, judging whether the weight precision corresponding to the current target layer is adjusted, if so, executing step 504; otherwise, step 503 is executed.
And 503, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
And step 504, increasing the weight precision corresponding to the current target layer.
For example, the precision of the weight corresponding to the current target layer may be increased by one precision level. Each increase hereinafter may be by one level of accuracy.
Step 505, judging whether the current recognition rate of the neural network is greater than a preset threshold value, if so, executing step 506; otherwise, step 508 is performed.
And 506, locking the weight precision corresponding to the current target layer as the weight precision before the current rise.
Step 507, judging whether the weight precision corresponding to all layers is locked, if so, ending the process; otherwise, return to execute step 501.
Illustratively, the locked weight precision is marked in the bit flag of the current target layer or the name of the calling operator.
And step 508, temporarily storing the raised weight precision.
Illustratively, the temporal weight precision is marked in the bit flag of the current target layer or the name of the calling operator.
Step 509, training the neural network to adjust the weight parameter value of the current target layer, and returning to execute step 501.
Wherein the training target is to improve the recognition rate of the neural network.
Optionally, training of the neural network is performed on the artificial intelligence chip, and the training process may refer to the related contents above, which is not described herein again.
In the embodiment of the invention, all layers in the neural network are sequenced according to the influence degree on the recognition rate, multiple rounds of lifting operation are carried out on the weight precision corresponding to all the layers in the neural network, in each round of lifting operation, the weight precision corresponding to each layer is firstly reduced to the lowest and then lifted once until the recognition rate of each layer of the neural network is greater than the preset threshold value, the configuration of the weight precision can be rapidly and uniformly realized, under the condition of considering the recognition rate of the neural network, the model area of the neural network is effectively controlled, the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the chip performance is improved, and the chip power consumption is reduced.
Fig. 6 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention, in which a neural network is taken as an image recognition model, and assuming that the image recognition model is a convolutional neural network model, the method may include:
step 601, determining a current target layer in the image recognition model.
All layers in the image recognition model are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer.
Illustratively, before this step, the method may further include: calculating an initial recognition rate of the image recognition model, reducing the weight precision of the current layer from a first precision to a second precision for each layer in the image recognition model, calculating a descending value of the recognition rate of the image recognition model relative to the initial recognition rate, and sequencing all the layers according to the descending value to obtain a sequencing result, wherein the larger the descending value is, the higher the influence degree on the recognition rate is. For example, the image recognition model may include a convolutional layer, a pooling layer, and a fully-connected layer. For example, the initial recognition rate is 0.98, and the initial weight precision of the convolutional layer, the pooling layer, and the all-connected layer is FP32. After the weight precision of the convolutional layer is reduced to FP16, the identification rate is 0.9, and the reduction value is 0.08; after the weight precision of the pooling layer is reduced to FP16, the identification rate is changed to 0.94, and the reduction value is 0.04; when the weight accuracy of the full-link layer is reduced to FP16, the discrimination rate becomes 0.96, and the reduction value becomes 0.02. The sorting results are sorted into a convolutional layer, a pooling layer and a fully-connected layer from large to small according to the descending value.
In this step, the current target layer is determined according to the sorting result. When the step is executed for the first time, the convolutional layer is determined as a current target layer, the pooling layer is determined as the current target layer after the weight precision of the convolutional layer is locked, and the fully-connected layer is determined as the current target layer after the weight precision of the pooling layer is locked.
Step 602, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
And 603, increasing the weight precision corresponding to the current target layer.
For example, the precision of the weight corresponding to the current target layer may be increased by one precision level. Each increase hereinafter may be by one level of accuracy.
Step 604, judging whether the current recognition rate of the image recognition model is greater than a preset threshold value, if so, executing step 605; otherwise, step 607 is performed.
And step 605, locking the weight precision corresponding to the current target layer as the weight precision before the current rise.
Step 606, judging whether the weight precision corresponding to all layers is locked, if so, ending the process; otherwise, the process returns to the step 601.
Illustratively, the precision of the weight of the lock is marked in the bit flag of the current target layer or the name of the calling operator.
Step 607, training the image recognition model to adjust the weight parameter value of the current target layer, and returning to execute step 603.
The training target is to improve the recognition rate of the image recognition model. In the training, a preset number of images can be used as training samples, and the image training samples are input into the image recognition model to train the image recognition model.
Optionally, training of the image recognition model is performed on the artificial intelligence chip, and the training process may refer to the above related content. Illustratively, image training sample data is obtained through a first processing core, feature map data to be output of the convolutional layer is calculated according to the image training sample data and weight parameters of the convolutional layer, weight precision of the convolutional layer is obtained, the precision of the feature map data to be output of the convolutional layer is configured to be the weight precision of the convolutional layer (if the current target layer is the convolutional layer, the weight precision is the increased weight precision), output feature map data of the convolutional layer is obtained and output to a second processing core, the feature vector data to be output of the convolutional layer is calculated through the second processing core according to the output feature map data of the convolutional layer and the weight parameters of the convolutional layer, the weight precision of a fully-connected layer is obtained, the precision of the feature vector data to be output of the convolutional layer is configured to be the weight precision of the fully-connected layer, output feature vector data of the pooling layer is obtained and output to a third processing core, an image recognition result is calculated and output according to the output feature vector data of the pooling layer and the weight parameters of the fully-connected layer through the third processing core, the recognition result is adjusted to improve the recognition rate of the image recognition model as the target.
In the embodiment of the invention, all layers in the image recognition model are sequenced according to the influence degree on the recognition rate, and the weight precision of the current target layer is sequentially tried to be reduced to the lowest and then increased until the recognition rate of the image recognition model is greater than the preset threshold value, so that the configuration of the weight precision can be quickly realized, the model area of the image recognition model is effectively controlled under the condition of considering the recognition rate of the image recognition model, the resource utilization rate in an artificial intelligent chip bearing the image recognition model is improved, the chip performance is improved, and the chip power consumption is reduced.
Fig. 7 is a block diagram of a weight precision configuration apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may perform weight precision configuration by executing a weight precision configuration method. As shown in fig. 7, the apparatus includes:
a target layer determining module 701, configured to determine a current target layer in a neural network, where all layers in the neural network are sorted according to an influence degree on an identification rate, and a layer with a high influence degree is preferentially determined as the target layer;
a weight precision reduction module 702, configured to reduce the weight precision corresponding to the current target layer to a preset minimum precision;
the weight precision increasing module 703 is configured to increase the weight precision corresponding to the current target layer, determine whether the current identification rate of the neural network is greater than a preset threshold, and if so, lock the weight precision corresponding to the current target layer as the weight precision before the current increase;
and a target layer switching module 704, configured to re-determine the current target layer if the target layer switching condition is met.
According to the weight precision configuration device provided by the embodiment of the invention, all layers in the neural network are sorted according to the influence degrees on the recognition rate, the layer with the high influence degree is preferentially determined as the target layer, the weight precision corresponding to the current target layer is reduced to the preset lowest precision, then the weight precision corresponding to the current target layer is increased, if the current recognition rate of the neural network is greater than the preset threshold value, the weight precision corresponding to the current target layer is locked as the weight precision before the increase, and the current target layer is re-determined under the condition that the switching condition of the target layer is met. By adopting the technical scheme, all layers in the neural network are sequenced according to the influence degree on the recognition rate, the weight precision of the current target layer is sequentially tried to be reduced to the lowest and then increased, and the upper limit of the weight precision of each layer is reasonably controlled under the condition of considering the recognition rate of the neural network, so that the resource utilization rate in the artificial intelligent chip bearing the neural network is improved, the performance of the chip is improved, and the power consumption of the chip is reduced.
Optionally, the weight precision increasing module is further configured to: after judging whether the current recognition rate of the neural network is greater than a preset threshold value, if the current recognition rate of the neural network is less than or equal to the preset threshold value, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current recognition rate of the neural network is greater than the preset threshold value;
and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a preset threshold value; the preferentially determining the layer with the high degree of influence as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer.
Optionally, multiple rounds of raising operations are performed on the weight precision corresponding to all layers in the neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised at most once;
the device also includes: the weight precision temporary storage module temporarily stores the raised weight precision if the current recognition rate of the neural network is less than or equal to a preset threshold value after judging whether the current recognition rate of the neural network is greater than or equal to the preset threshold value;
and, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset minimum precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
Optionally, the determining the current target layer again includes:
and re-determining the current target layer until the weight precision corresponding to all the layers is locked.
Optionally, the apparatus further comprises:
and the training module is used for training the neural network to adjust the weight parameter value of the current target layer if the current recognition rate of the neural network is smaller than or equal to a preset threshold value after judging whether the current recognition rate of the neural network is larger than or equal to the preset threshold value, wherein the training target is to improve the recognition rate of the neural network.
Optionally, the training the neural network comprises training the neural network on an artificial intelligence chip;
in the process of training the neural network, the method comprises the following steps:
acquiring the precision of data to be output of a first layer in the neural network, wherein the first layer comprises any one or more layers except the last layer in the neural network;
acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer;
and configuring the precision of the data to be output according to the weight precision of the second layer.
Optionally, all layers in the neural network are ranked according to the degree of influence on the recognition rate by the following method:
calculating an initial recognition rate of the neural network;
for each layer in the neural network, reducing the weight precision of the current layer from a first precision to a second precision, and calculating a reduction value of the recognition rate of the neural network relative to the initial recognition rate;
and sequencing all layers according to the descending value, wherein the larger the descending value is, the higher the influence degree on the recognition rate is.
Optionally, if there are at least two layers with the same degradation value, the at least two layers are sorted according to the distance from the input layer of the neural network, where the smaller the distance, the higher the influence degree on the recognition rate.
The embodiment of the invention provides computer equipment, and the weight precision configuration device provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 8 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 800 may include: a memory 801, a processor 802 and a computer program stored on the memory 801 and executable by the processor, wherein the processor 802 implements the weight precision configuration method according to the embodiment of the present invention when executing the computer program. It should be noted that, if the neural network is trained on the artificial intelligence chip, the computer device 800 may further include the artificial intelligence chip. Alternatively, if the computer device 800 is referred to as a first computer device, the training may be performed in another second computer device including an artificial intelligence chip, and the second computer device sends the training result to the first computer device.
The computer device provided by the embodiment of the invention sorts all layers in the neural network according to the influence degree on the recognition rate, sequentially tries to reduce the weight precision of the current target layer to the lowest and then increase the weight precision, reasonably controls the upper limit of the weight precision of each layer under the condition of considering the recognition rate of the neural network, improves the resource utilization rate in the artificial intelligent chip bearing the neural network, improves the performance of the chip and reduces the power consumption of the chip.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions for performing a weight precision configuration method when executed by a computer processor.
The weight precision configuration device, the equipment and the storage medium provided in the above embodiments can execute the weight precision configuration method provided in any embodiment of the present invention, and have corresponding functional modules and beneficial effects for executing the method. For technical details that are not described in detail in the above embodiments, reference may be made to a weight precision configuration method provided in any embodiment of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A weight precision configuration method is applied to an artificial intelligence chip, wherein the artificial intelligence chip comprises a plurality of processing cores, a neural network is deployed in the artificial intelligence chip, the neural network is deployed or mapped to the corresponding processing cores by taking a layer as a unit, the processing cores are in one-to-one correspondence with the layers of the neural network, and the method comprises the following steps:
determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer;
reducing the weight precision corresponding to the current target layer to a preset minimum precision;
increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than a preset threshold value, and if so, locking the weight precision corresponding to the current target layer to the weight precision before the increase;
under the condition of meeting the switching condition of the target layer, re-determining the current target layer; obtaining the weight precision of the neural network after the weight precision corresponding to all layers is locked; when data is input into one layer of the neural network, the processing core corresponding to the current layer determines the precision of the output data of the current layer according to the weight precision of the next layer.
2. The method of claim 1, after determining whether the current recognition rate of the neural network is greater than a preset threshold, further comprising:
if the current identification rate is less than or equal to the preset threshold value, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is greater than the preset threshold value;
and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a preset threshold value; the preferentially determining the layer with the high degree of influence as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer.
3. The method according to claim 1, wherein multiple rounds of raising operations are performed for the weight precisions corresponding to all layers in the neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised at most once;
after judging whether the current recognition rate of the neural network is greater than a preset threshold value, the method further comprises the following steps:
if the weight precision is less than or equal to the weight precision, temporarily storing the raised weight precision;
and, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset minimum precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision.
4. The method according to any one of claims 1-3, wherein said re-determining the current target layer comprises:
and re-determining the current target layer until the weight precision corresponding to all the layers is locked.
5. The method according to any one of claims 1 to 3, further comprising, after determining whether the current recognition rate of the neural network is greater than a preset threshold:
and if the weight parameter value is smaller than or equal to the weight parameter value, training the neural network to adjust the weight parameter value of the current target layer, wherein the training target is to improve the recognition rate of the neural network.
6. The method of claim 5, wherein the training the neural network comprises training the neural network on an artificial intelligence chip;
in the process of training the neural network, the method comprises the following steps:
acquiring the precision of data to be output of a first layer in the neural network, wherein the first layer comprises any one or more layers except the last layer in the neural network;
acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer;
and configuring the precision of the data to be output according to the weight precision of the second layer.
7. The method of claim 1, wherein all layers in the neural network are ranked by degree of influence on recognition rate by:
calculating an initial recognition rate of the neural network;
for each layer in the neural network, reducing the weight precision of the current layer from a first precision to a second precision, and calculating a reduction value of the recognition rate of the neural network relative to the initial recognition rate;
and sequencing all layers according to the descending value, wherein the larger the descending value is, the higher the influence degree on the recognition rate is.
8. The method according to claim 7, wherein if there are at least two layers with the same degradation value, the at least two layers are sorted according to the distance from the input layer of the neural network, wherein the smaller the distance, the higher the degree of influence on the recognition rate.
9. A weight precision configuration device is applied to an artificial intelligence chip, wherein the artificial intelligence chip comprises a plurality of processing cores, a neural network is deployed in the artificial intelligence chip, the neural network is deployed or mapped to the corresponding processing cores by taking a layer as a unit, the processing cores are in one-to-one correspondence with the layers of the neural network, and the device comprises:
the target layer determining module is used for determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer;
the weight precision reduction module is used for reducing the weight precision corresponding to the current target layer to a preset lowest precision;
the weight precision increasing module is used for increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than a preset threshold value or not, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase;
the target layer switching module is used for re-determining the current target layer under the condition of meeting the target layer switching condition;
the apparatus is further configured to: obtaining the weight precision of the neural network after the weight precision corresponding to all layers is locked; when data is input into one layer of the neural network, the processing core corresponding to the current layer determines the precision of the output data of the current layer according to the weight precision of the next layer.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-8 when executing the computer program.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
CN202010659069.0A 2020-07-09 2020-07-09 Weight precision configuration method, device, equipment and storage medium Active CN111831356B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010659069.0A CN111831356B (en) 2020-07-09 2020-07-09 Weight precision configuration method, device, equipment and storage medium
US18/015,065 US11797850B2 (en) 2020-07-09 2021-07-08 Weight precision configuration method and apparatus, computer device and storage medium
PCT/CN2021/105172 WO2022007879A1 (en) 2020-07-09 2021-07-08 Weight precision configuration method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010659069.0A CN111831356B (en) 2020-07-09 2020-07-09 Weight precision configuration method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111831356A CN111831356A (en) 2020-10-27
CN111831356B true CN111831356B (en) 2023-04-07

Family

ID=72901207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010659069.0A Active CN111831356B (en) 2020-07-09 2020-07-09 Weight precision configuration method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111831356B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11797850B2 (en) 2020-07-09 2023-10-24 Lynxi Technologies Co., Ltd. Weight precision configuration method and apparatus, computer device and storage medium
CN115600657A (en) * 2021-07-09 2023-01-13 中科寒武纪科技股份有限公司(Cn) Processing device, equipment and method and related products thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009625A (en) * 2016-11-01 2018-05-08 北京深鉴科技有限公司 Method for trimming and device after artificial neural network fixed point
CN110163368A (en) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 Deep learning model training method, apparatus and system based on mixed-precision
WO2019165602A1 (en) * 2018-02-28 2019-09-06 深圳市大疆创新科技有限公司 Data conversion method and device
WO2020092532A1 (en) * 2018-10-30 2020-05-07 Google Llc Quantizing trained long short-term memory neural networks

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345939B (en) * 2017-01-25 2022-05-24 微软技术许可有限责任公司 Neural network based on fixed-point operation
CN107102727B (en) * 2017-03-17 2020-04-07 武汉理工大学 Dynamic gesture learning and recognition method based on ELM neural network
US11842280B2 (en) * 2017-05-05 2023-12-12 Nvidia Corporation Loss-scaling for deep neural network training with reduced precision
US20190050710A1 (en) * 2017-08-14 2019-02-14 Midea Group Co., Ltd. Adaptive bit-width reduction for neural networks
CN112836792A (en) * 2017-12-29 2021-05-25 华为技术有限公司 Training method and device of neural network model
CN108229670B (en) * 2018-01-05 2021-10-08 中国科学技术大学苏州研究院 Deep neural network acceleration platform based on FPGA
CN110738315A (en) * 2018-07-18 2020-01-31 华为技术有限公司 neural network precision adjusting method and device
US20200042856A1 (en) * 2018-07-31 2020-02-06 International Business Machines Corporation Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit
CN109740508B (en) * 2018-12-29 2021-07-23 北京灵汐科技有限公司 Image processing method based on neural network system and neural network system
CN109800877B (en) * 2019-02-20 2022-12-30 腾讯科技(深圳)有限公司 Parameter adjustment method, device and equipment of neural network
CN110009100B (en) * 2019-03-28 2021-01-05 安徽寒武纪信息科技有限公司 Calculation method of user-defined operator and related product
CN111339027B (en) * 2020-02-25 2023-11-28 中国科学院苏州纳米技术与纳米仿生研究所 Automatic design method of reconfigurable artificial intelligent core and heterogeneous multi-core chip

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009625A (en) * 2016-11-01 2018-05-08 北京深鉴科技有限公司 Method for trimming and device after artificial neural network fixed point
WO2019165602A1 (en) * 2018-02-28 2019-09-06 深圳市大疆创新科技有限公司 Data conversion method and device
WO2020092532A1 (en) * 2018-10-30 2020-05-07 Google Llc Quantizing trained long short-term memory neural networks
CN110163368A (en) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 Deep learning model training method, apparatus and system based on mixed-precision

Also Published As

Publication number Publication date
CN111831356A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN111831355A (en) Weight precision configuration method, device, equipment and storage medium
CN111831358B (en) Weight precision configuration method, device, equipment and storage medium
US20180197084A1 (en) Convolutional neural network system having binary parameter and operation method thereof
CN111144561B (en) Neural network model determining method and device
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN112200300B (en) Convolutional neural network operation method and device
CN111831356B (en) Weight precision configuration method, device, equipment and storage medium
CN108122032A (en) A kind of neural network model training method, device, chip and system
CN108304926B (en) Pooling computing device and method suitable for neural network
CN111831354A (en) Data precision configuration method, device, chip array, equipment and medium
CN112101525A (en) Method, device and system for designing neural network through NAS
CN111783937A (en) Neural network construction method and system
CN116644804B (en) Distributed training system, neural network model training method, device and medium
CN111191789B (en) Model optimization deployment system, chip, electronic equipment and medium
CN115860081B (en) Core algorithm scheduling method, system, electronic equipment and storage medium
CN111931901A (en) Neural network construction method and device
CN109657791A (en) It is a kind of based on cerebral nerve cynapse memory mechanism towards open world successive learning method
CN117271101B (en) Operator fusion method and device, electronic equipment and storage medium
CN113688988A (en) Precision adjustment method and device, and storage medium
CN114217688B (en) NPU power consumption optimization system and method based on neural network structure
CN110533176B (en) Caching device for neural network computation and related computing platform thereof
CN108875919B (en) Data processing apparatus, data processing method, and storage medium product
CN114399152B (en) Method and device for optimizing comprehensive energy scheduling of industrial park
US20230206048A1 (en) Crossbar-based neuromorphic computing apparatus capable of processing large input neurons and method using the same
CN118153708A (en) Data processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant