CN113778655A - Network precision quantification method and system - Google Patents

Network precision quantification method and system Download PDF

Info

Publication number
CN113778655A
CN113778655A CN202010519846.1A CN202010519846A CN113778655A CN 113778655 A CN113778655 A CN 113778655A CN 202010519846 A CN202010519846 A CN 202010519846A CN 113778655 A CN113778655 A CN 113778655A
Authority
CN
China
Prior art keywords
quantized
network
core
precision
total amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010519846.1A
Other languages
Chinese (zh)
Inventor
孟凡辉
胡川
李涵
张爱飞
吴欣洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202010519846.1A priority Critical patent/CN113778655A/en
Priority to PCT/CN2021/099198 priority patent/WO2021249440A1/en
Priority to US17/760,023 priority patent/US11783168B2/en
Publication of CN113778655A publication Critical patent/CN113778655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network precision quantification method, which is applied to a many-core chip and comprises the following steps: determining reference precision according to the total amount of the core resources of the many-core chip and each network to be quantized, wherein the total amount of the core resources required by each network to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip; and determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip. The invention also discloses a system for quantizing the network precision. The invention has the beneficial effects that: the chip resource utilization rate under the coexistence of a plurality of networks is improved, and meanwhile, the precision loss caused by excessive quantization is reduced.

Description

Network precision quantification method and system
Technical Field
The invention relates to the technical field of neural networks, in particular to a method and a system for quantizing network precision.
Background
In the related technology, when the neural network carries out precision quantization, the precision selection is single, and only the reduction of resource utilization by single network quantization is considered. The problem of how to select the precision of a plurality of networks and realize reasonable distribution of on-chip resources under the condition that a plurality of networks coexist in a many-core chip is not solved, so that the chip resources are not fully utilized or the precision is lost due to excessive quantification.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and a system for quantizing network accuracy, which can improve the utilization rate of chip resources in the presence of multiple networks and reduce the accuracy loss caused by network accuracy quantization.
The invention provides a network precision quantification method, which is applied to a many-core chip and comprises the following steps:
determining reference precision according to the total amount of the core resources of the many-core chip and each network to be quantized, wherein the total amount of the core resources required by each network to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip;
and determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip.
As a further improvement of the present invention, determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized includes:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1If the total quantity of the core resources Z is larger than that of the many-core chip, determining the total quantity S of the core resources required by the quantization of each network to be quantized according to the 2 nd precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to 1, wherein the 2 nd precision is lower than the 1 st precision;
gradually decreasing according to the quantization precision, and repeating the steps until the total quantity S of the core resources required by quantization of each network to be quantized according to the jth precision is determinedjLess than or equal to the total core resources of the many-core chipAnd Z, determining the jth precision as the reference precision, wherein j is an integer greater than or equal to 2.
As a further improvement of the present invention, determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip includes:
quantizing the total amount S of the required core resources according to the reference precision j of each network to be quantizedjAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources YjWherein Y isj=Z-SjJ is an integer greater than or equal to 2;
determining at least one core resource quantity difference W [ i ] } of each to-be-quantized network quantized according to each precision and the jth precision, wherein i is used for representing the number of the to-be-quantized network, and i is an integer greater than or equal to 1;
according to the residual nuclear resource quantity YjAnd determining the target precision corresponding to each network to be quantized according to the quantity difference of the core resources quantized step by step of each network to be quantized.
As a further improvement of the invention, according to the amount of remaining core resources YjAnd determining the target precision corresponding to each network to be quantized according to the number difference of each core resource quantized step by step of each network to be quantized, comprising:
for each network to be quantized, calculating a difference W [ i ] from the number of the at least one core resource]={M[i][1]-M[i][j],M[i][2]-M[i][j],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the networks to be quantized is less than or equal to the residual core resource quantity YjThen, the sum of the resource differences of the cores of the networks to be quantized is the maximum;
and when the sum of the core resource differences of the networks to be quantized is the maximum, determining the target precision corresponding to the networks to be quantized.
As a further improvement of the invention, each network to be quantized comprises a first network to be quantized and a second network to be quantized, the target precision corresponding to the second network to be quantized is the kth precision,
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the target precision comprises the following steps:
determining a first class of network to be quantized according to a reference precision j/Quantifying the total amount of core resources S requiredj /And quantizing the total amount S of the core resources required by the second type of network to be quantized according to the specified precision kkWherein j is/Is an integer greater than or equal to 1, k is an integer greater than or equal to 1;
according to the total amount S of the core resourcesj /The total amount of nuclear resources SkAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources Yj /Wherein Y isj /=Z-Sj /-Sk
Determining the quantization of the first type of network to be quantized according to each precision and j/At least one core resource number difference W [ i 'with quantized precision']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., wherein i/Number, i, for representing a first type of network to be quantized/Is an integer greater than or equal to 1;
according to the residual nuclear resource quantity Yj /And determining the target precision corresponding to the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized. As a further improvement of the invention, according to the amount of remaining core resources Yj /And the number difference of each core resource quantized step by step of the first type of network to be quantized determines the target precision corresponding to the first type of network to be quantized, and the method comprises the following steps:
for each network in the first type of network to be quantized, calculating the difference W [ i']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the first type of network to be quantized is less than or equal to the residual core resource quantity Yj /The sum of the resource differences of each core of the first type of network to be quantifiedMaximum;
and when the sum of the core resource differences of the first type of network to be quantized is maximum, determining the target precision corresponding to the first type of network to be quantized.
As a further improvement of the present invention, the target accuracies corresponding to the networks to be quantized are not completely the same.
As a further improvement of the invention, the total quantity S of the core resources required by quantizing each network to be quantized according to the 1 st precision is determined1The method comprises the following steps:
calculating the number M [ i ] [1] of core resources required when the ith network quantization is 1 st precision;
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Wherein,
Figure RE-GDA0002535385660000031
the number of the networks is N, i is used for representing the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
As a further improvement of the present invention, determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized includes:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 1 st precision as the reference precision;
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the determining of the target precision corresponding to each network to be quantized comprises the following steps:
and determining the 1 st precision as the target precision corresponding to each network to be quantized.
As a further improvement of the invention, the quantization precision comprises one or more of fp32, fp16, int8 and int 4.
The invention also provides a network precision quantification system, which is applied to a many-core chip and comprises the following components:
the reference precision determining module is used for determining reference precision according to the total amount of the core resources of the many-core chip and the networks to be quantized, wherein the total amount of the core resources required by the networks to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip;
and the target precision determining module is used for determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip.
As a further improvement of the present invention, the reference accuracy determination module is configured to:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1If the total quantity of the core resources Z is larger than that of the many-core chip, determining the total quantity S of the core resources required by the quantization of each network to be quantized according to the 2 nd precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to 1, wherein the 2 nd precision is lower than the 1 st precision;
gradually decreasing according to the quantization precision, and repeating the steps until the total quantity S of the core resources required by quantization of each network to be quantized according to the jth precision is determinedjAnd determining the jth precision as the reference precision, wherein j is an integer greater than or equal to 2.
As a further improvement of the present invention, the target accuracy determination module is configured to:
quantizing the total amount S of the required core resources according to the reference precision j of each network to be quantizedjAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources YjWherein Y isj=Z-SjJ is an integer greater than or equal to 2;
determining at least one core resource quantity difference W [ i ] } of each to-be-quantized network quantized according to each precision and the jth precision, wherein i is used for representing the number of the to-be-quantized network, and i is an integer greater than or equal to 1;
according to the residual nuclear resource quantity YjAnd determining the target precision corresponding to each network to be quantized according to the quantity difference of the core resources quantized step by step of each network to be quantized.
As a further improvement of the invention, according to the amount of remaining core resources YjAnd determining the target precision corresponding to each network to be quantized according to the number difference of each core resource quantized step by step of each network to be quantized, comprising:
for each network to be quantized, calculating a difference W [ i ] from the number of the at least one core resource]={M[i][1]-M[i][j],M[i][2]-M[i][j],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the networks to be quantized is less than or equal to the residual core resource quantity YjThen, the sum of the resource differences of the cores of the networks to be quantized is the maximum;
and when the sum of the core resource differences of the networks to be quantized is the maximum, determining the target precision corresponding to the networks to be quantized.
As a further improvement of the present invention, each network to be quantized includes a first network to be quantized and a second network to be quantized, a target precision corresponding to the second network to be quantized is a kth precision, and the target precision determining module is configured to:
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the target precision comprises the following steps:
determining a first class of network to be quantized according to a reference precision j/Quantifying the total amount of core resources S requiredj /And quantizing the total amount S of the core resources required by the second type of network to be quantized according to the specified precision kkWherein j is/Is an integer greater than or equal to 1, k is an integer greater than or equal to 1;
according to the total amount of the core resourcesSj /The total amount of nuclear resources SkAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources Yj /Wherein Y isj /=Z-Sj /-Sk
Determining the quantization of the first type of network to be quantized according to each precision and j/At least one core resource number difference W [ i 'with quantized precision']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., wherein i/Number, i, for representing a first type of network to be quantized/Is an integer greater than or equal to 1;
according to the residual nuclear resource quantity Yj /And determining the target precision corresponding to the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized.
As a further improvement of the invention, according to the amount of remaining core resources Yj /And the number difference of each core resource quantized step by step of the first type of network to be quantized determines the target precision corresponding to the first type of network to be quantized, and the method comprises the following steps:
for each network in the first type of network to be quantized, calculating the difference W [ i']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the first type of network to be quantized is less than or equal to the residual core resource quantity Yj /Then, the sum of the resource differences of the cores of the first type of network to be quantized is maximum;
and when the sum of the core resource differences of the first type of network to be quantized is maximum, determining the target precision corresponding to the first type of network to be quantized.
As a further improvement of the present invention, the target accuracies corresponding to the networks to be quantized are not completely the same.
As a further improvement of the invention, the total quantity S of the core resources required by quantizing each network to be quantized according to the 1 st precision is determined1The method comprises the following steps:
calculating the number M [ i ] [1] of core resources required when the ith network quantization is 1 st precision;
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Wherein,
Figure RE-GDA0002535385660000061
the number of the networks is N, i represents the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
As a further improvement of the present invention, the reference accuracy determination module is configured to:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 1 st precision as the reference precision;
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the determining of the target precision corresponding to each network to be quantized comprises the following steps:
and determining the 1 st precision as the target precision corresponding to each network to be quantized.
As a further improvement of the invention, the quantization precision comprises one or more of fp32, fp16, int8 and int 4.
The invention also provides an electronic device comprising a memory and a processor, the memory storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method.
The invention also provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to perform the method.
The invention has the beneficial effects that: the problem of resource distribution in a neural network multi-network model chip is solved in a quantification mode, the problem of optimal selection of quantification precision is solved through a multi-level dynamic programming algorithm, distribution and use of an internal memory in a chip are optimized, the internal memory resource of a limited chip is fully utilized in the internal memory distribution process of the on-chip multi-network model, and meanwhile, precision loss caused by over-quantification is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.
Fig. 1 is a schematic flowchart illustrating a network accuracy quantifying method according to an exemplary embodiment of the disclosure;
FIG. 2 is a flow diagram illustrating a plurality of network progressive quantization according to an exemplary embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating progressive quantization of multiple networks according to yet another exemplary embodiment of the present disclosure;
fig. 4 is a schematic diagram of a many-core chip multiple network core resource allocation according to an exemplary embodiment of the disclosure;
in the figure, the position of the upper end of the main shaft,
1. 1, a network; 2. a 2 nd network; 3. a 3 rd network; 4. a 4 th network; 5. a 5 th network; 6. A 6 th network; 7. and 7 th network.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the disclosed embodiment, the directional indications are only used to explain the relative position relationship between the components, the motion situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, in the description of the present disclosure, the terms used are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the disclosure will be more readily understood by those of ordinary skill in the art. The drawings are only for purposes of illustrating the described embodiments of the disclosure. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present disclosure may be employed without departing from the principles described in the present disclosure.
The method for quantizing network precision according to the embodiment of the present disclosure is applied to a many-core chip, and as shown in fig. 1, the method includes:
in step S1, determining a reference precision according to the total amount of core resources of the many-core chip and each network to be quantized, where the total amount of core resources required for quantization of each network to be quantized according to the reference precision is less than or equal to the total amount of core resources of the many-core chip;
in step S2, a target precision corresponding to each network to be quantized is determined according to the reference precision and the total amount of core resources of the many-core chip.
The network to be quantized can be various neural network models needing quantization. The total amount of core resources of a many-core chip may refer to the number of cores of the many-core chip or the core storage resources of the many-core chip. The reference precision and the target precision can be determined from various quantization precisions. For example, the quantization precision may include one or more of fp32 (32-bit data type), fp16 (16-bit data type), int8 (8-bit data type), int4 (4-bit data type). For example, the quantization precision includes fp32, fp16, int8, and int4, and the reference precision and the target precision corresponding to each network to be quantized can be selected from fp32, fp16, int8, and int 4. The target precision corresponding to each network to be quantized may be the same, for example, the target precision corresponding to each network to be quantized is a reference precision, and the target precision corresponding to each network to be quantized may not be completely the same.
In an artificial intelligence chip, chip storage resources are limited, and with the increase of consumption of the chip storage resources by a neural network, precision quantification of a neural network model becomes more important. The many-core chip comprises a plurality of cores, which has great advantages in neural network application, and the many-core chip has more cores, and the quantization precision is not only a simple precision selection. When a plurality of neural networks coexist, resources of the plurality of networks need to be reasonably distributed, so that the utilization rate of core resources is high, and the precision loss caused by over-measurement can be reduced. The method disclosed by the invention aims at the condition that a plurality of networks coexist in a many-core chip, based on the network quantity and core resource perception, according to the resources in the many-core chip and the specific resource requirements of the plurality of networks, multi-level quantification of a plurality of network precisions is carried out in a multi-level dynamic planning mode, the resources of each network are dynamically planned and distributed, reasonable resource distribution and full utilization of the resources in the chip are realized, and the precision loss caused by over-quantification is reduced.
In an optional implementation manner, determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized includes:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether is less than or equal toThe total amount of core resources Z of the many-core chip;
if the total amount of the core resources S1Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 1 st precision as the reference precision;
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the determining of the target precision corresponding to each network to be quantized comprises the following steps:
and determining the 1 st precision as the target precision corresponding to each network to be quantized.
For example, the 1 st precision is the highest precision, each network to be quantized may be quantized according to the highest precision, and when the total amount of core resources required for quantizing each network to be quantized according to the highest precision is less than or equal to the total amount of core resources of the many-core chip, each network to be quantized may be directly allocated with core resources according to the highest quantization precision, so that the core resources are utilized most sufficiently, and each network to be quantized is guaranteed to have high precision.
In an optional implementation manner, the total amount S of core resources required for quantizing each network to be quantized according to the 1 st precision is determined1The method comprises the following steps:
calculating the number M [ i ] [1] of core resources required when the ith network quantization is 1 st precision;
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Wherein,
Figure RE-GDA0002535385660000091
the number of the networks to be quantized is N, i represents the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
In an optional implementation manner, determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized includes:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether or not less than or equal to the total amount of core resources of the many-core chipZ;
If the total amount of the core resources S1If the total quantity of the core resources Z is larger than that of the many-core chip, determining the total quantity S of the core resources required by the quantization of each network to be quantized according to the 2 nd precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to 1, wherein the 2 nd precision is lower than the 1 st precision;
gradually decreasing according to the quantization precision, and repeating the steps until the total quantity S of the core resources required by quantization of each network to be quantized according to the jth precision is determinedjAnd determining the jth precision as the reference precision, wherein j is an integer greater than or equal to 2.
When determining the total amount of core resources required by quantization of each network to be quantized according to quantization precision, the method can be used for determining the total amount of the core resources required by quantization of each network to be quantized according to quantization precision
Figure RE-GDA0002535385660000101
And calculating, wherein the number of the networks to be quantized is N, i represents the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
For example, the quantization precision includes fp32, fp16, int8, int 4. The 1 st precision may represent fp32 precision, the 2 nd precision represents fp16, the 3 rd precision represents int8, and the 4 th precision represents int 4. When each network to be quantized quantizes the required total amount S of the core resources according to the 1 st precision1When the sum of the core resources Z of the many-core chip is greater than the sum of the core resources Z of the many-core chip, the quantization precision is gradually decreased, for example, the sum of the core resources S required by each network to be quantized to quantize according to the 2 nd precision (j ═ 2) can be determined2And the 2 nd precision is lower than the 1 st precision. If S2Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 2 nd precision (j-2) as the reference precision.
If S2If the total amount of the core resources Z of the many-core chip is larger than the total amount of the core resources Z of the many-core chip, the quantization precision is continuously decreased step by step, and the total amount of the core resources S required by quantization of each network to be quantized according to the 3 rd precision (j is 3) is determined3If S is3Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 3 rd precision (j-3) as the reference precision. If S3The total amount of the core resources Z larger than the many-core chip is gradually decreased according to the quantization precision, and the like, for example, the total amount of the core resources S required by quantization of each network to be quantized according to the 4 th precision is determined4Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 4 th precision (j is 4) as the reference precision.
It should be understood that the reference precision may be determined in various manners, as long as the total amount of core resources required for quantizing each network to be quantized according to the reference precision is less than or equal to the total amount of core resources of the many-core chip, and the manner of determining the reference precision is not limited by the present disclosure.
In an optional implementation manner, determining a target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip includes:
quantizing the total amount S of the required core resources according to the reference precision j of each network to be quantizedjAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources YjWherein Y isj=Z-SjJ is an integer greater than or equal to 2;
determining at least one core resource quantity difference W [ i ] } of each to-be-quantized network quantized according to each precision and the jth precision, wherein i is used for representing the number of the to-be-quantized network, and i is an integer greater than or equal to 1;
according to the residual nuclear resource quantity YjAnd determining the target precision corresponding to each network to be quantized according to the quantity difference of the core resources quantized step by step of each network to be quantized.
In another alternative embodiment, the amount of remaining core resources Y is based onjAnd determining the target precision corresponding to each network to be quantized according to the number difference of each core resource quantized step by step of each network to be quantized, comprising:
for each network to be quantized, counting differences from the at least one core resource
W[i]={M[i][1]-M[i][j],M[i][2]-M[i][j],.. determining a core resource quantity difference so as to enable each core resource of each network to be quantizedThe sum of the number differences is less than or equal to the residual core resource amount YjThen, the sum of the resource differences of the cores of the networks to be quantized is the maximum;
and when the sum of the core resource differences of the networks to be quantized is the maximum, determining the target precision corresponding to the networks to be quantized.
For example, as shown in fig. 2, the multiple networks are subjected to progressive quantization from fp32, int8 to int4, and the quantization precision may be selected from three types, that is, fp32, int8, and int4, where the 1 st precision is fp32, the 2 nd precision is int8, and the 3 rd precision is int 4.
Step1, determining the total core resource S required by quantizing each network to be quantized according to fp32 precision1And judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1The total amount Z of the core resources of the many-core chip is less than or equal to that of the many-core chip, the fp32 precision is determined as the target precision corresponding to each network to be quantized, and the core resources are quantitatively distributed to each network to be quantized according to the fp32 precision;
if the total amount of the core resources S1Performing Step2 when the total amount of the core resources Z is larger than the total amount of the core resources Z of the many-core chip;
step2, determining the total amount S of the core resources required by the quantization of each network to be quantized according to int8 precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S2The precision of each network to be quantized can be fp32 precision or int8 precision, and the total quantity of the core resources needed by each network to be quantized after being quantized according to the respective target precision is ensured to be less than or equal to the total quantity Z of the core resources of the many-core chip;
if the total amount of the core resources S2Performing Step3 when the total amount of the core resources Z is larger than the total amount of the core resources Z of the many-core chip;
step3, determining the precision of each network to be quantized according to int4Total amount of core resources S required for the formation3And judging the total amount S of the nuclear resources3Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S3The precision of each network to be quantized can be selected from fp32 precision, int8 precision or int4 precision, and the total quantity of the core resources needed by each network to be quantized after being quantized according to the respective target precision is ensured to be less than or equal to the total quantity Z of the core resources of the many-core chip;
if the total amount of the core resources S3And if the total quantity of the core resources Z is larger than the total quantity of the core resources Z of the many-core chip, determining that all the networks cannot be stored simultaneously.
In some optional embodiments, the total amount of core resources S is determined at Step11Greater than the total amount of core resources Z of the many-core chip, determining the total amount of core resources S in Step22Less than or equal to the total amount of the core resources Z of the many-core chip, determining the precision of int8 as the reference precision, and determining the amount of the remaining core resources Y2=Z-S2. When the number difference of the core resources of each network to be quantized from fp32 precision to int8 precision is obtained, the number difference of the core resources of each network to be quantized is W [ i [ i ] ]]=M[i][1]-M[i][2]According to the amount of remaining nuclear resources Y2And determining the target precision corresponding to each network to be quantized according to the quantity difference of each core resource quantized step by step of each network to be quantized.
In some optional embodiments, the total amount of core resources S is determined at Step11Greater than the total amount of core resources Z of the many-core chip, determining the total amount of core resources S in Step22Greater than the total amount of core resources Z of the many-core chip, determining the total amount of core resources S in Step33Less than or equal to the total amount of the core resources Z of the many-core chip, determining the precision of int4 as the reference precision, and determining the amount of the residual core resources Y3=Z-S3. When the number difference of the core resources of each network to be quantized from fp32 precision to int4 precision is obtained, the number difference of the core resources of each network to be quantized can be W [ i [ ]]=M[i][1]-M[i][3]May be W [ i ]]=M[i][2]-M[i][3]According to the amount of remaining nuclear resources Y3=Z-S3And determining the target precision corresponding to each network to be quantized according to the quantity difference of each core resource quantized step by step of each network to be quantized.
When the target accuracy of each network to be quantified is determined, each network to be quantified can be quantified according to the reference accuracy, the quantified residual nuclear resource amount is used as the capacity of a new backpack, the nuclear resource amount difference of each network to be quantified represents the value of a new backpack article, the value sum of each network to be quantified is maximum through a 0-1 backpack dynamic planning algorithm, and at the moment, the target accuracy corresponding to each network to be quantified when the value sum is maximum can be determined. According to the method, for a plurality of networks on a chip, the sum of the differences of the core resources of each network is used as the optimal solution of the target precision of each network to the maximum, so that the core resources of the chip are utilized fully, the reasonable distribution of the core resources is realized, and the precision loss caused by the single quantization precision selected by the plurality of networks is reduced.
In an optional implementation manner, the target accuracies corresponding to the networks to be quantized are not completely the same.
For example, as shown in fig. 4, when the 1 st network quantization is fp32 precision, the 2 nd network, the 4 th network and the 5 th network quantization are int8 precision, and the 3 rd network, the 6 th network and the 7 th network quantization are int4 precision, the sum of the values of all the networks is the largest, that is, it is determined that the target precision of the 1 st network is fp32 precision, the target precision of the 2 nd network, the 4 th network and the 5 th network is int8 precision, and the target precision of the 3 rd network, the 6 th network and the 7 th network is int4 precision.
In an optional implementation manner, each network to be quantized includes a first network to be quantized and a second network to be quantized, a target precision corresponding to the second network to be quantized is a kth precision,
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the target precision comprises the following steps:
determining a first class of network to be quantized according to a reference precision j/Required for quantizationTotal amount of core resources Sj /And quantizing the total amount S of the core resources required by the second type of network to be quantized according to the specified precision kkWherein j is/Is an integer greater than or equal to 1, k is an integer greater than or equal to 1;
according to the total amount S of the core resourcesj /The total amount of nuclear resources SkAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources Yj /Wherein Y isj /=Z-Sj /-Sk
Determining the quantization of the first type of network to be quantized according to each precision and j/At least one core resource number difference W [ i 'with quantized precision']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., wherein i/Number representing the first type of network to be quantized, i/Is an integer greater than or equal to 1;
according to the residual nuclear resource quantity Yj /And determining the target precision corresponding to the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized.
In another alternative embodiment, the amount of remaining core resources Y is based onj /And the number difference of each core resource quantized step by step of the first type of network to be quantized determines the target precision corresponding to the first type of network to be quantized, and the method comprises the following steps:
for each network in the first type of network to be quantized, calculating the difference W [ i']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the first type of network to be quantized is less than or equal to the residual core resource quantity Yj /Then, the sum of the resource differences of the cores of the first type of network to be quantized is maximum;
and when the sum of the core resource differences of the first type of network to be quantized is maximum, determining the target precision corresponding to the first type of network to be quantized.
When the precision quantization is carried out on each network to be quantized, the precision quantization can be carried out according to the requirement, for example, a certain network or a plurality of networks to be quantized are quantized according to the specified precision, and then the corresponding target precision is determined for other networks to be quantized.
For example, as shown in fig. 3, the multiple networks are quantized from fp32, fp16, int8 to int4 in a stepwise manner, and the quantization precision can be selected from four types, j 32, fp16, int8 and int4 /1 denotes fp32 precision, j /2 denotes fp16 precision, j/Int8 precision, j, is expressed as 3/4 denotes int4 precision:
step1, determining the total core resource S required by quantizing each network to be quantized according to fp32 precision1And judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1The total amount Z of the core resources of the many-core chip is less than or equal to that of the many-core chip, the fp32 precision is determined as the target precision corresponding to each network to be quantized, and the core resources are quantitatively distributed to each network to be quantized according to the fp32 precision;
if the total amount of the core resources S1Performing Step2 when the total amount of the core resources Z is larger than the total amount of the core resources Z of the many-core chip;
step2, determining the total core resource S required by quantizing each network to be quantized according to fp16 precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S2The precision of each network to be quantized can be fp32 precision or fp16 precision, and the total quantity of the core resources needed by each network to be quantized after quantization according to the respective target precision is ensured to be less than or equal to the total quantity Z of the core resources of the many-core chip;
if the total amount of the core resources S2Performing Step3 when the total amount of the core resources Z is larger than the total amount of the core resources Z of the many-core chip;
step3, determining each waiting timeQuantifying network quantifies total amount of core resources S required according to int8 precision3And judging the total amount S of the nuclear resources3Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S3The precision of each network to be quantized can be selected from fp32 precision, fp16 precision or int8 precision, and the total quantity of the core resources needed by each network to be quantized after being quantized according to the respective target precision is ensured to be less than or equal to the total quantity Z of the core resources of the many-core chip;
if the total amount of the core resources S3Performing Step4 when the total amount of the core resources Z is larger than the total amount of the core resources Z of the many-core chip;
step4, determining the total amount S of the core resources required by the quantization of each network to be quantized according to int4 precision4And judging the total amount S of the nuclear resources4Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S4The precision of each network to be quantized can be selected from fp32 precision, fp16 precision, int8 precision or int4 precision, and the total quantity of the core resources needed by each network to be quantized after being quantized according to the respective target precision is ensured to be less than or equal to the total quantity Z of the core resources of the many-core chip;
if the total amount of the core resources S4And if the total quantity of the core resources Z is larger than the total quantity of the core resources Z of the many-core chip, determining that all the networks cannot be stored simultaneously.
For example, the total amount of core resources S is determined in Step11Greater than the total amount of core resources Z of the many-core chip, determining the total amount of core resources S in Step22Greater than the total amount of core resources Z of the many-core chip, determining the total amount of core resources S in Step33Greater than the total amount of core resources Z of the many-core chip, and in Step4, determining the total amount of core resources S4Cores smaller than or equal to the many-core chipAnd the total resource amount Z, namely determining the int4 precision as the reference precision of the first type of network to be quantized, wherein the total core resource amount required by the first type of network to be quantized according to the int4 precision is S4 /The total amount of core resources required by the second type of network to be quantized for quantization according to specified precision fp32 is Sk. At this time, the remaining core resource amount Y is determined4 /=Z-S4 /-Sk. When the core resource quantity difference of the first type of network to be quantized from fp32 precision to int4 precision is solved, the core resource quantity difference of each network in the first type of network to be quantized can be W [ i']=M[i′][1]-M[i′][4]May be W [ i']=M[i′][2]-M[i′][4]May be W [ i']=M[i′][3]-M[i′][4]According to the amount of remaining nuclear resources Y4 /=Z-S4 /-SkAnd determining the target precision corresponding to each network in the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized.
When the target accuracy of each network to be quantized is determined, the first network to be quantized can be quantized according to reference accuracy, the second network to be quantized is quantized according to specified accuracy, the residual nuclear resource amount after the two networks to be quantized are quantized is used as the capacity of a new backpack, the value of a new backpack article is represented by the nuclear resource amount difference of each network in the first network to be quantized, the value sum of the first network to be quantized is obtained through a 0-1 backpack dynamic programming algorithm, and at the moment, the target accuracy corresponding to the first network to be quantized when the value sum is maximum can be determined.
The method disclosed by the invention solves the problem of resource allocation in the neural network multi-network model chip in a quantization mode, solves the problem of optimal selection of quantization precision by a multi-level dynamic programming algorithm, optimizes the allocation and use of the memory in the chip, realizes the full utilization of limited chip memory resources in the memory allocation process of the multi-network model chip, and simultaneously reduces the precision loss caused by over-quantization.
The quantization system of network precision stated in this disclosure embodiment, the stated system is applied to many core chips, the stated system includes:
the reference precision determining module is used for determining reference precision according to the total amount of the core resources of the many-core chip and the networks to be quantized, wherein the total amount of the core resources required by the networks to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip;
and the target precision determining module is used for determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip.
In an optional embodiment, the reference accuracy determination module is further configured to:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1Less than or equal to the total amount of core resources Z of the many-core chip, and determining the 1 st precision as the reference precision;
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the determining of the target precision corresponding to each network to be quantized comprises the following steps:
and determining the 1 st precision as the target precision corresponding to each network to be quantized.
The above may be understood that, when each network to be quantized is quantized according to the highest precision, and the total amount of the required core resources is less than or equal to the total amount of the core resources of the many-core chip, each network to be quantized may directly allocate the core resources according to the highest quantization precision, so that the core resources are most fully utilized.
In an optional implementation manner, the total amount S of core resources required for quantizing each network to be quantized according to the 1 st precision is determined1The method comprises the following steps:
calculating the number M [ i ] [1] of core resources required when the ith network quantization is 1 st precision;
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Wherein,
Figure RE-GDA0002535385660000171
the number of the networks is N, i represents the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
In an optional implementation manner, determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized includes:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1If the total quantity of the core resources Z is larger than that of the many-core chip, determining the total quantity S of the core resources required by the quantization of each network to be quantized according to the 2 nd precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to 1, wherein the 2 nd precision is lower than the 1 st precision;
gradually decreasing according to the quantization precision, and repeating the steps until the total quantity S of the core resources required by quantization of each network to be quantized according to the jth precision is determinedjAnd determining the jth precision as the reference precision, wherein j is an integer greater than or equal to 2.
When determining the total amount of core resources required by quantization of each network to be quantized according to quantization precision, the method can be used for determining the total amount of the core resources required by quantization of each network to be quantized according to quantization precision
Figure RE-GDA0002535385660000172
And calculating, wherein the number of the networks to be quantized is N, i represents the number of the networks to be quantized, i is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
In an alternative embodiment, the quantization precision includes one or more of fp32 (32-bit data type), fp16 (16-bit data type), int8 (8-bit data type), int4 (4-bit data type). For example, the quantization precision of each network to be quantized is selected from fp32, fp16, int8 and int4, where j-1 represents fp32 precision, j-2 represents fp16 precision, j-3 represents int8 precision, and j-4 represents int4 precision. For example, the quantization precision of each network to be quantized is selected from fp32, int8, int4, where j-1 represents fp32 precision, j-2 represents int8 precision, and j-3 represents int4 precision.
In an optional embodiment, the target accuracy determination module is further configured to:
quantizing the total amount S of the required core resources according to the reference precision j of each network to be quantizedjAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources YjWherein Y isj=Z-SjJ is an integer greater than or equal to 2;
determining at least one core resource quantity difference W [ i ] } of each to-be-quantized network quantized according to each precision and the jth precision, wherein i is used for representing the number of the to-be-quantized network, and i is an integer greater than or equal to 1;
according to the residual nuclear resource quantity YjAnd determining the target precision corresponding to each network to be quantized according to the quantity difference of the core resources quantized step by step of each network to be quantized.
In an alternative embodiment, the amount of remaining core resources Y is based onjAnd determining the target precision corresponding to each network to be quantized according to the number difference of each core resource quantized step by step of each network to be quantized, comprising:
for each network to be quantized, calculating a difference W [ i ] from the number of the at least one core resource]={M[i][1]-M[i][j],M[i][2]-M[i][j],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the networks to be quantized is less than or equal to the residual core resource quantity YjThen, the sum of the resource differences of the cores of the networks to be quantized is the maximum;
and when the sum of the core resource differences of the networks to be quantized is the maximum, determining the target precision corresponding to the networks to be quantized.
When the target accuracy of each network to be quantified is determined, the system can quantify each network to be quantified according to the reference accuracy, the quantified residual nuclear resource amount is used as the capacity of a new backpack, the nuclear resource amount difference of each network to be quantified represents the value of a new backpack article, the value sum of each network to be quantified is maximum through a 0-1 backpack dynamic planning algorithm, and at the moment, the target accuracy corresponding to each network to be quantified when the value sum is maximum can be determined. According to the method, for a plurality of networks on a chip, the sum of the differences of the core resources of each network is used as the optimal solution of the target precision of each network to the maximum, so that the core resources of the chip are utilized fully, the reasonable distribution of the core resources is realized, and the precision loss caused by the single quantization precision selected by the plurality of networks is reduced.
In an optional implementation manner, the target accuracies corresponding to the networks to be quantized are not completely the same.
In an optional implementation manner, each network to be quantized includes a first network to be quantized and a second network to be quantized, a target precision corresponding to the second network to be quantized is a kth precision, and the target precision determining module is configured to:
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the target precision comprises the following steps:
determining a first class of network to be quantized according to a reference precision j/Quantifying the total amount of core resources S requiredj /And quantizing the total amount S of the core resources required by the second type of network to be quantized according to the specified precision kkWherein j is/Is an integer greater than or equal to 1, k is an integer greater than or equal to 1;
according to the total amount S of the core resourcesj /The total amount of nuclear resources SkAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources Yj /Wherein Y isj /=Z-Sj /-Sk
Determining the quantization of the first type of network to be quantized according to each precision and j/At least one core resource number difference W [ i 'with quantized precision']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],..}, in which i/Number representing the first type of network to be quantized, i/Is an integer greater than or equal to 1;
according to the residual nuclear resource quantity Yj /And determining the target precision corresponding to the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized.
In an alternative embodiment, the amount of remaining core resources Y is based onj /And the number difference of each core resource quantized step by step of the first type of network to be quantized determines the target precision corresponding to the first type of network to be quantized, and the method comprises the following steps:
for each network in the first type of network to be quantized, calculating the difference W [ i']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the first type of network to be quantized is less than or equal to the residual core resource quantity Yj /Then, the sum of the resource differences of the cores of the first type of network to be quantized is maximum;
and when the sum of the core resource differences of the first type of network to be quantized is maximum, determining the target precision corresponding to the first type of network to be quantized.
When the precision quantization is carried out on each network to be quantized, the precision quantization can be carried out according to the requirement, for example, a certain network or a plurality of networks to be quantized are quantized according to the specified precision, and then the corresponding target precision is determined for other networks to be quantized. When the target accuracy of each network to be quantized is determined, a first type of network to be quantized can be quantized according to reference accuracy, a second type of network to be quantized is quantized according to specified accuracy, the residual nuclear resource amount after the two types of networks to be quantized are quantized is used as the capacity of a new backpack, the value of a new backpack article is represented by the nuclear resource amount difference of each network in the first type of network to be quantized, the value sum of the first type of network to be quantized is obtained through a 0-1 backpack dynamic programming algorithm, and at the moment, the target accuracy corresponding to the first type of network to be quantized when the value sum is maximum can be determined.
The system disclosed by the invention solves the problem of resource allocation in the neural network multi-network model chip in a quantization mode, solves the problem of optimal selection of quantization precision by a multi-level dynamic programming algorithm, optimizes the allocation and use of the memory in the chip, realizes the full utilization of limited chip memory resources in the memory allocation process of the multi-network model chip, and simultaneously reduces the precision loss caused by over-quantization.
The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method for quantifying network accuracy in the above embodiments.
In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory and, when executed by the one or more processors, perform the method of quantifying network accuracy in any of the method embodiments described above.
The above-mentioned product can execute the method provided by the embodiment of the present application, and has corresponding functional modules and beneficial effects of the execution method, and the technical details not described in detail in the embodiment of the present application can be referred to the method for quantifying network accuracy provided by the embodiment of the present application.
The present disclosure also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described embodiments of a method for quantifying network accuracy.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the present disclosure has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiment disclosed, but that the disclosure will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A quantification method of network precision is applied to a many-core chip, and comprises the following steps:
determining reference precision according to the total amount of the core resources of the many-core chip and each network to be quantized, wherein the total amount of the core resources required by each network to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip;
and determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip.
2. The method of claim 1, wherein determining the reference precision according to the total amount of core resources of the many-core chip and each network to be quantized comprises:
determining the total amount S of core resources required by quantization of each network to be quantized according to the 1 st precision1
Judging the total amount S of the nuclear resources1Whether the total amount of the core resources Z of the many-core chip is less than or equal to;
if the total amount of the core resources S1If the total quantity of the core resources Z is larger than that of the many-core chip, determining the total quantity S of the core resources required by the quantization of each network to be quantized according to the 2 nd precision2And judging the total amount S of the nuclear resources2Whether the total amount of the core resources Z of the many-core chip is less than or equal to 1, wherein the 2 nd precision is lower than the 1 st precision;
gradually decreasing according to the quantization precision, and repeating the steps until the total quantity S of the core resources required by each network to be quantized according to the jth precision is determinedjIs less than or equal toAnd determining the jth precision as the reference precision, wherein j is an integer greater than or equal to 2.
3. The method of claim 1, wherein determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip comprises:
quantizing the total amount S of the required core resources according to the reference precision j of each network to be quantizedjAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources YjWherein Y isj=Z-SjJ is an integer greater than or equal to 2;
determining at least one core resource quantity difference W [ i ] } of each to-be-quantized network quantized according to each precision and the jth precision, wherein i is used for representing the number of the to-be-quantized network, and i is an integer greater than or equal to 1;
according to the residual nuclear resource quantity YjAnd determining the target precision corresponding to each network to be quantized according to the quantity difference of the core resources quantized step by step of each network to be quantized.
4. The method of claim 3, wherein the amount of remaining core resources Y is based onjAnd determining the target precision corresponding to each network to be quantized according to the number difference of each core resource quantized step by step of each network to be quantized, comprising:
for each network to be quantized, calculating a difference W [ i ] from the number of the at least one core resource]={M[i][1]-M[i][j],M[i][2]-M[i][j],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the networks to be quantized is less than or equal to the residual core resource quantity YjThen, the sum of the resource differences of the cores of the networks to be quantized is the maximum;
and when the sum of the core resource differences of the networks to be quantized is the maximum, determining the target precision corresponding to the networks to be quantized.
5. The method according to claim 1, wherein each network to be quantized comprises a first network to be quantized and a second network to be quantized, the target precision of the second network to be quantized is kth precision,
determining target precision corresponding to each network to be quantized according to the reference precision and the total amount of core resources of the many-core chip, wherein the target precision comprises the following steps:
determining a first class of network to be quantized according to a reference precision j/Quantifying the total amount of core resources S requiredj /And quantizing the total amount S of the core resources required by the second type of network to be quantized according to the specified precision kkWherein j is/Is an integer greater than or equal to 1, k is an integer greater than or equal to 1;
according to the total amount S of the core resourcesj /The total amount of nuclear resources SkAnd determining the total amount of the core resources Z of the many-core chip to determine the amount of the remaining core resources Yj /Wherein Y isj /=Z-Sj /-Sk
Determining the quantization of the first type of network to be quantized according to each precision and j/At least one core resource number difference W [ i 'with quantized precision']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., wherein i/Number, i, for representing a first type of network to be quantized/Is an integer greater than or equal to 1;
according to the residual nuclear resource quantity Yj /And determining the target precision corresponding to the first type of network to be quantized according to the quantity difference of the core resources quantized step by step of the first type of network to be quantized.
6. The method of claim 5, wherein the amount of remaining core resources Y is based onj /And the number difference of each core resource quantized step by step of the first type of network to be quantized determines the target precision corresponding to the first type of network to be quantized, and the method comprises the following steps:
for each network in the first type of network to be quantized, calculating the difference W [ i']={M[i′][1]-M[i′][j/],M[i′][2]-M[i′][j/],., determining a core resource quantity difference so that the sum of the core resource quantity differences of the first type of network to be quantized is less than or equal to the residual core resource quantity Yj /Then, the sum of the resource differences of the cores of the first type of network to be quantized is maximum;
and when the sum of the core resource differences of the first type of network to be quantized is maximum, determining the target precision corresponding to the first type of network to be quantized.
7. The method of claim 1, wherein the target accuracies corresponding to the networks to be quantized are not identical.
8. A system for quantifying network accuracy, the system being applied to a many-core die, the system comprising:
the reference precision determining module is used for determining reference precision according to the total amount of the core resources of the many-core chip and the networks to be quantized, wherein the total amount of the core resources required by the networks to be quantized according to the reference precision is less than or equal to the total amount of the core resources of the many-core chip;
and the target precision determining module is used for determining the target precision corresponding to each network to be quantized according to the reference precision and the total amount of the core resources of the many-core chip.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1-7.
CN202010519846.1A 2020-06-09 2020-06-09 Network precision quantification method and system Pending CN113778655A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010519846.1A CN113778655A (en) 2020-06-09 2020-06-09 Network precision quantification method and system
PCT/CN2021/099198 WO2021249440A1 (en) 2020-06-09 2021-06-09 Network accuracy quantification method, system, and apparatus, electronic device, and readable medium
US17/760,023 US11783168B2 (en) 2020-06-09 2021-06-09 Network accuracy quantification method and system, device, electronic device and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010519846.1A CN113778655A (en) 2020-06-09 2020-06-09 Network precision quantification method and system

Publications (1)

Publication Number Publication Date
CN113778655A true CN113778655A (en) 2021-12-10

Family

ID=78834459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010519846.1A Pending CN113778655A (en) 2020-06-09 2020-06-09 Network precision quantification method and system

Country Status (3)

Country Link
US (1) US11783168B2 (en)
CN (1) CN113778655A (en)
WO (1) WO2021249440A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301259A (en) * 2014-10-13 2015-01-21 东南大学 Resource allocation method applicable to multi-hop wireless mesh network
US20160321776A1 (en) * 2014-06-20 2016-11-03 Tencent Technology (Shenzhen) Company Limited Model Parallel Processing Method and Apparatus Based on Multiple Graphic Processing Units
KR101710087B1 (en) * 2016-04-29 2017-02-24 국방과학연구소 Service resource allocation approach Method and System based on a successive knapsack algorithm with variable profits
US20180046894A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Method for optimizing an artificial neural network (ann)
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108874542A (en) * 2018-06-07 2018-11-23 桂林电子科技大学 Kubernetes method for optimizing scheduling neural network based
CN109902807A (en) * 2019-02-27 2019-06-18 电子科技大学 A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network
CN110674936A (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Neural network processing method and device, computer equipment and storage medium
CN111226233A (en) * 2017-10-24 2020-06-02 国际商业机器公司 Facilitating neural network efficiency

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843679B (en) * 2016-03-18 2018-11-02 西北工业大学 Adaptive many-core resource regulating method
KR20170128703A (en) * 2016-05-13 2017-11-23 한국전자통신연구원 Many-core system and operating method thereof
CN110348562B (en) 2019-06-19 2021-10-15 北京迈格威科技有限公司 Neural network quantization strategy determination method, image identification method and device
US11551054B2 (en) * 2019-08-27 2023-01-10 International Business Machines Corporation System-aware selective quantization for performance optimized distributed deep learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321776A1 (en) * 2014-06-20 2016-11-03 Tencent Technology (Shenzhen) Company Limited Model Parallel Processing Method and Apparatus Based on Multiple Graphic Processing Units
CN104301259A (en) * 2014-10-13 2015-01-21 东南大学 Resource allocation method applicable to multi-hop wireless mesh network
KR101710087B1 (en) * 2016-04-29 2017-02-24 국방과학연구소 Service resource allocation approach Method and System based on a successive knapsack algorithm with variable profits
US20180046894A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Method for optimizing an artificial neural network (ann)
CN111226233A (en) * 2017-10-24 2020-06-02 国际商业机器公司 Facilitating neural network efficiency
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108874542A (en) * 2018-06-07 2018-11-23 桂林电子科技大学 Kubernetes method for optimizing scheduling neural network based
CN109902807A (en) * 2019-02-27 2019-06-18 电子科技大学 A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network
CN110674936A (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Neural network processing method and device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KUAN WANG ET AL: "HAQ:Hardware-Aware Automated Qunantization with Mixed Precision", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, 9 January 2020 (2020-01-09) *
MUSTAPHA NOURELFATH ET AL: "Quantized Hopfield Networks for Reliablity Optimization", 《RELIABILITY ENGINEERING &SYSTEM SAFETY》, vol. 81, no. 2, 31 August 2003 (2003-08-31) *
方荣强等: "多层神经网络算法的计算特征建模方法", 《计算机研究与发展》, no. 6, 30 June 2019 (2019-06-30) *
蔡瑞初;钟椿荣;余洋;陈炳丰;卢冶;陈瑶;: "面向"边缘"应用的卷积神经网络量化与压缩方法", 计算机应用, no. 09, 23 April 2018 (2018-04-23) *

Also Published As

Publication number Publication date
US20230040375A1 (en) 2023-02-09
US11783168B2 (en) 2023-10-10
WO2021249440A1 (en) 2021-12-16

Similar Documents

Publication Publication Date Title
CN111444009B (en) Resource allocation method and device based on deep reinforcement learning
CN113055308B (en) Bandwidth scheduling method, traffic transmission method and related products
US20220083386A1 (en) Method and system for neural network execution distribution
CN103365720B (en) For dynamically adjusting the method and system of global Heap Allocation under multi-thread environment
CN109002358A (en) Mobile terminal software adaptive optimization dispatching method based on deeply study
CN110764885B (en) Method for splitting and unloading DNN tasks of multiple mobile devices
CN110231984B (en) Multi-workflow task allocation method and device, computer equipment and storage medium
CN114253735B (en) Task processing method and device and related equipment
CN110378529B (en) Data generation method and device, readable storage medium and electronic equipment
CN108173905A (en) A kind of resource allocation method, device and electronic equipment
CN115421930B (en) Task processing method, system, device, equipment and computer readable storage medium
CN111176840A (en) Distributed task allocation optimization method and device, storage medium and electronic device
CN113391824A (en) Computing offload method, electronic device, storage medium, and computer program product
CN112231117A (en) Cloud robot service selection method and system based on dynamic vector hybrid genetic algorithm
CN110167031B (en) Resource allocation method, equipment and storage medium for centralized base station
CN113986562A (en) Resource scheduling strategy generation method and device and terminal equipment
CN110826782B (en) Data processing method and device, readable storage medium and electronic equipment
CN109165729A (en) The dispatching method and system of neural network
CN117311998B (en) Large model deployment method and system
CN113778655A (en) Network precision quantification method and system
CN109746918B (en) Optimization method for delay of cloud robot system based on joint optimization
CN113673753A (en) Load regulation and control method and device for electric vehicle charging
CN116848508A (en) Scheduling tasks for computer execution based on reinforcement learning model
CN114584476A (en) Traffic prediction method, network training device and electronic equipment
CN110058941A (en) Task scheduling and managing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination