WO2021147362A1 - 基于硬件环境的数据量化方法、装置及可读存储介质 - Google Patents

基于硬件环境的数据量化方法、装置及可读存储介质 Download PDF

Info

Publication number
WO2021147362A1
WO2021147362A1 PCT/CN2020/117338 CN2020117338W WO2021147362A1 WO 2021147362 A1 WO2021147362 A1 WO 2021147362A1 CN 2020117338 W CN2020117338 W CN 2020117338W WO 2021147362 A1 WO2021147362 A1 WO 2021147362A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
quantization
weight
feature map
hardware
Prior art date
Application number
PCT/CN2020/117338
Other languages
English (en)
French (fr)
Inventor
曹其春
赵雅倩
董刚
梁玲燕
尹文枫
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/794,110 priority Critical patent/US11748970B2/en
Publication of WO2021147362A1 publication Critical patent/WO2021147362A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a data quantification method, device and computer-readable storage medium based on a hardware environment.
  • AI Artificial Intelligence
  • This application provides a data quantification method, device, and computer-readable storage medium based on a hardware environment, which solves the problems of software package redundancy and dependency library conflicts caused by supporting multiple deep learning frameworks in related technologies.
  • the embodiments of the present invention provide the following technical solutions:
  • One aspect of the embodiments of the present invention provides a data quantification method based on a hardware environment, including:
  • the image data in the input data set is calculated through the intermediate calculation graph process to obtain feature map data;
  • the quantization parameter and the quantized weight data are written into the bin file to generate the quantized file data.
  • the method before writing the quantization parameter and the quantized weight data to the bin file according to hardware requirements, the method further includes:
  • the quantization parameter and the quantized weight data are reordered, so that the data format of the quantization parameter and the quantized weight data is a 64-channel parallel format.
  • the intermediate calculation graph data and weight data obtained by analyzing the current deep learning framework include:
  • the TVM component in the NNVM compiler is used to execute the operation operator of the intermediate calculation graph and calculate the weight data in the form of a tensor.
  • the weighted quantization factor and the feature map quantization factor are combined according to the quantization factor merging calculation relationship formula, and the quantization factor merging calculation relationship is:
  • y w is the weighted quantization factor
  • y f is the quantization factor of the feature map
  • n is the quantization parameter
  • the uniformly quantizing the weight data and the feature map data of each layer according to a preset linear quantization method, and calculating the weight quantization factor and the feature map quantization factor include:
  • the defined data is averagely quantified to between -127 and +127 of the precision of the int8 data, and the weighted quantization factor and the feature map quantization factor are calculated.
  • the calculation of the corresponding limit value includes:
  • Weight data; correspondingly, the limited range of the weight data is (-x w ,+x w );
  • Another aspect of the embodiments of the present invention provides a data quantization device based on a hardware environment, including:
  • the framework data analysis module is used to parse the model files under the current deep learning framework to obtain intermediate calculation graph data and weight data that have nothing to do with the hardware environment;
  • the feature map data calculation module is configured to calculate feature map data from the image data in the input data set through an intermediate calculation map process based on the intermediate calculation map data and the weight data;
  • the linear quantization module is configured to uniformly quantize the weight data and the feature map data of each layer according to the preset linear quantization method, and calculate the weight quantization factor and the feature map quantization factor;
  • a quantization parameter calculation module configured to combine the weighted quantization factor and the feature map quantization factor to obtain a quantization parameter, where the quantization parameter is a parameter that causes the hardware to use shift instead of division;
  • the hardware identifiable data output module is used to write the quantization parameter and the quantized weight data to the bin file according to hardware requirements, and generate quantized file data.
  • it further includes a reordering module configured to reorder the quantization parameter and the quantized weight data so that the data format of the quantization parameter and the quantized weight data is 64 channels Parallel format.
  • a reordering module configured to reorder the quantization parameter and the quantized weight data so that the data format of the quantization parameter and the quantized weight data is 64 channels Parallel format.
  • An embodiment of the present invention also provides a hardware environment-based data quantization device, including a processor, which is used to implement the steps of the hardware environment-based data quantization method described in any preceding item when the processor is used to execute a computer program stored in a memory .
  • the embodiment of the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a hardware environment-based data quantization program.
  • the hardware environment-based data quantization program is executed by a processor, the following The steps of the hardware environment-based data quantification method described in any one of the preceding items.
  • FIG. 1 is a schematic flowchart of a data quantification method based on a hardware environment according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of another data quantization method based on a hardware environment provided by an embodiment of the present invention
  • FIG. 3 is a schematic diagram of data display after reordering according to an embodiment of the present invention.
  • FIG. 4 is a structural diagram of a specific implementation manner of a data quantization device based on a hardware environment provided by an embodiment of the present invention
  • FIG. 5 is a structural diagram of another specific implementation manner of a data quantization device based on a hardware environment provided by an embodiment of the present invention.
  • FIG. 6 is a structural diagram of still another specific implementation manner of a data quantization device based on a hardware environment provided by an embodiment of the present invention.
  • Fig. 1 is a schematic flowchart of a data quantization method based on a hardware environment provided by an embodiment of the present invention.
  • the embodiment of the present invention may include the following content:
  • the deep learning framework can be any existing deep learning framework, and the model file under the deep learning framework, such as the pb file under the tensorflow framework, is loaded. Any existing method can be used to parse the intermediate calculation graph and weight data, which will not affect the implementation of this application.
  • the NNVM component of the NNVM compiler can be used to convert different frame model files into frame-independent calculation intermediate graphs, and then Use the TVM component to execute the operation operators of the intermediate graph, and relieve the independence of the various operations of the calculation graph from the hardware, so that this application can support various deep learning frameworks and run on different computer platforms.
  • the NNVM compiler includes two components based on the TVM stack, an NNVM component that processes intermediate calculation graphs and a TVM component that processes tensor operation operators.
  • the NNVM component (computational graph media representation stack) can be used to represent work commands from different frameworks as standardized computational graphs, and then convert these high-level computational graphs into execution graphs. The idea of expressing the intermediate calculation graph in a form independent of the framework.
  • the execution object of the TVM component tensor medium representation stack
  • the execution object of the TVM component is the operation operator in the calculation graph, which optimizes the operation operator to the operation operator corresponding to the target back-end hardware. It is different from the NNVM component in that it provides a language independent of hardware and corresponding to a specific domain to simplify the execution of operators at the tensor index level.
  • the input data set may be a training data set under the corresponding deep learning framework of S101, and the total number of images contained in the input data set is not limited in this application. For example, it may be a data set containing 2000 images.
  • the image data in the input data set can also be image preprocessed. For example, the image preprocessing can first perform layer processing, and then uniformly convert the image data into float type data. Finally, translation processing can be performed, and the translation value can be any value between 0-255.
  • the output data of each layer of the calculation graph can be calculated for the input image data, that is, the feature map data of each layer can be obtained, and the calculated feature map data of each layer can be stored in the memory, and the calculation results can be accumulated. Then calculate the average of each feature map data.
  • S103 Perform uniform quantization on the weight data and the feature map data of each layer according to the preset linear quantization method, and calculate the weight quantization factor and the feature map quantization factor.
  • any linear quantization method can be used to quantify the data, which is not limited in this application.
  • int8 data accuracy is used instead of float data accuracy
  • linear quantization method for each layer of data is adopted, and the distribution of feature map data and weight data of each layer is counted, and the data is limited to between -X and +X.
  • the average quantization is between -127 and +127 of int8, and the quantization parameters are combined into one in the process of hardware calculation and reasoning, and it is approximately the parameter that the hardware can use shift instead of division.
  • the weighted quantization factor and the feature map quantization factor are calculated according to the corresponding linear quantization method and the original data.
  • the original data refers to the weight data or the feature map data of each layer.
  • the quantization parameter is a parameter that causes the hardware to use shift instead of division.
  • the output data and weight data distribution of each layer of the calculation graph are calculated according to the linear quantization method, and reasonable quantization parameters are calculated.
  • the finally obtained quantization parameters can enable the hardware to use shift instead of division when performing inference.
  • the quantization parameter may be, for example, an approximate multiple of 2, that is, the quantization parameter is used as a shift parameter for hardware inference. It can be applied to any kind of hardware, such as FPGA.
  • S105 Write the quantization parameter and the quantized weight data to the bin file according to the hardware requirements, and generate the quantized file data.
  • the model file under the deep learning framework is converted into intermediate calculation graph data and weight data that is not related to hardware, so as to support various deep learning frameworks to run on different computer platforms;
  • Linear quantization strategy uniformly quantize the feature map data and weight data of each layer, keep the minimum quantization parameters, and merge the quantization parameters at the same time, which is conducive to hardware reasoning.
  • the data is written into the bin file that can be recognized by the hardware, thus solving the problem of multi-support related technologies.
  • the software redundancy and dependency library conflicts caused by this deep learning framework can effectively reduce the various interfaces developed to support multiple deep learning frameworks, simplify the workload and development difficulty of the host-side software, and reduce hardware calculations. Resources, accelerate the inference speed of the AI accelerator card, and reduce energy consumption.
  • FIG. 2 is a schematic flowchart of another data quantization method based on a hardware environment provided by an embodiment of the present invention.
  • the embodiment of the present invention can be applied to FPGA-based (Field-Programmable Gate Array) AI accelerator card in int8 data accuracy quantification, specific can include the following:
  • the NNVM component in the NNVM compiler can be used to parse the model file to obtain the intermediate calculation graph data; the TVM component in the NNVM compiler can be used to execute the operation operator of the intermediate calculation graph and calculate the weight data in the form of a tensor. Obtain data that has nothing to do with the hardware, without being limited by the hardware environment used.
  • S203 Calculate the data distribution of the weighted data and the average data of each layer of feature maps, and calculate the corresponding limit value.
  • the feature map limit value of the average data of each layer of feature maps can be calculated according to the feature map limit value calculation relationship.
  • S204 Limit the weight data and the average data of each layer of feature maps within a corresponding limited range.
  • the limit range of the embodiment of the present invention is determined according to the corresponding limit value. Based on the limit value calculated in S203, the limit range of the weight data can be (-x w , +x w ); the limit range of the average data of each layer of feature maps can be ( -x f ,+x f ).
  • S205 Quantify the limited data on average to between -127 and +127 of the accuracy of the int8 data, and calculate the weighting factor and the feature map quantizing factor.
  • y w is the weighted quantization factor
  • y f is the feature map quantization factor
  • n is the quantization parameter
  • S207 Reorder the quantization parameter and the quantized weight data, so that the data format of the quantization parameter and the quantized weight data is a 64-channel parallel format.
  • the AI accelerator card developed by FPGA is used to maximize the use of hardware resources and facilitate the hardware 64-channel parallel computing operation.
  • the quantized parameters and quantized weight data meet the hardware 64-channel parallel strategy, and the data can be reordered and generated Figure 3 shows the binary bin file. In this way, when data is input to hardware for inference, there is no need to convert the data format, and the data can be evenly distributed to 64 parallel channels for calculation, reducing hardware resource usage in data conversion.
  • the implementation of the present invention solves the problems of software package redundancy and dependency library conflicts in order to support multiple deep learning frameworks in related technologies, and can effectively streamline the workload and development difficulty of host-side software, reduce hardware computing resources, and accelerate AI accelerates card inference speed and reduces energy consumption.
  • the embodiment of the present invention also provides a corresponding device for the data quantization method based on the hardware environment, which further makes the method more practical.
  • the device can be described separately from the perspective of functional modules and the perspective of hardware.
  • the following describes the hardware environment-based data quantization device provided by the embodiments of the present invention.
  • the hardware environment-based data quantization device described below and the hardware environment-based data quantization method described above can be referred to each other.
  • FIG. 4 is a structural diagram of a hardware environment-based data quantization device provided by an embodiment of the present invention in a specific implementation manner.
  • the device may include:
  • the framework data analysis module 401 is used to parse the model file under the current deep learning framework to obtain intermediate calculation graph data and weight data that have nothing to do with the hardware environment.
  • the feature map data calculation module 402 is configured to calculate the feature map data from the image data in the input data set through the intermediate calculation graph process based on the intermediate calculation map data and the weight data.
  • the linear quantization module 403 is configured to uniformly quantize the weight data and the feature map data of each layer according to a preset linear quantization method, and calculate the weight quantization factor and the feature map quantization factor.
  • the quantization parameter calculation module 404 is configured to combine the weight quantization factor and the feature map quantization factor to obtain a quantization parameter.
  • the quantization parameter is a parameter that causes the hardware to use shift instead of division.
  • the hardware identifiable data output module 405 is used to write the quantization parameter and the quantized weight data to the bin file according to the hardware requirements, and generate the quantized file data.
  • the apparatus may further include a reordering module 406, for example, for reordering the quantization parameter and the quantized weight data, so that the quantization parameter
  • the data format of the quantized weight data is a 64-channel parallel format.
  • the framework data analysis module 401 may be specifically configured to use the NNVM component in the NNVM compiler to parse the model file to obtain intermediate calculation graph data; use the TVM component in the NNVM compiler to perform intermediate calculations The operation operator of the graph and calculates the weight data in the form of a tensor.
  • the linear quantization module 403 may include:
  • the average value calculation sub-module is used to calculate the average value of each layer of feature map data as the average data of each layer of feature map;
  • the limit value calculation sub-module is used to calculate the data distribution of the weight data and the average data of each layer of feature maps, and calculate the corresponding limit value;
  • the data limitation sub-module is used to limit the weight data and the average data of each layer of the feature map within the corresponding limited range, and the limited range is determined according to the corresponding limited value;
  • the quantization sub-module is used to averagely quantify the limited data to between -127 and +127 of the int8 data accuracy, and calculate the weighted quantization factor and the feature map quantization factor.
  • each functional module of the hardware environment-based data quantization device described in the embodiment of the present invention can be specifically implemented according to the method in the above method embodiment.
  • the implementation of the present invention solves the problems of software package redundancy and dependency library conflicts in order to support multiple deep learning frameworks in related technologies, and can effectively streamline the workload and development difficulty of host-side software, reduce hardware computing resources, and accelerate AI accelerates card inference speed and reduces energy consumption.
  • Fig. 6 is a structural diagram of another data quantization device based on a hardware environment provided by an embodiment of the application. As shown in Figure 6, the device includes a memory 60 for storing computer programs;
  • the processor 61 is configured to implement the steps of the hardware environment-based data quantization method as mentioned in any of the above embodiments when executing a computer program, where the computer program can be compiled and implemented using the python language, for example.
  • the processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 61 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). accomplish.
  • the processor 61 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 61 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 61 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 60 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 60 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the memory 60 is used to store at least the following computer program 601, where the computer program is loaded and executed by the processor 61 to implement the relevant steps of the hardware environment-based data quantization method disclosed in any of the foregoing embodiments.
  • the resources stored in the memory 60 may also include an operating system 602 and data 603, etc., and the storage mode may be short-term storage or permanent storage.
  • the operating system 602 may include Windows, Unix, Linux, and so on.
  • the data 603 may include, but is not limited to, data corresponding to the test result and the like.
  • the data quantization device based on the hardware environment may further include a display screen 62, an input/output interface 63, a communication interface 64, a power supply 65, and a communication bus 66.
  • FIG. 6 does not constitute a limitation on the data quantization device based on the hardware environment, and may include more or less components than shown, such as the sensor 67.
  • each functional module of the hardware environment-based data quantization device described in the embodiment of the present invention can be specifically implemented according to the method in the above method embodiment.
  • the implementation of the present invention solves the problems of software package redundancy and dependency library conflicts in order to support multiple deep learning frameworks in related technologies, and can effectively streamline the workload and development difficulty of host-side software, reduce hardware computing resources, and accelerate AI accelerates card inference speed and reduces energy consumption.
  • the data quantification method based on the hardware environment in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium Execute all or part of the steps of the method in each embodiment of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM, registers, hard disks, Various media that can store program codes, such as removable disks, CD-ROMs, magnetic disks, or optical disks.
  • an embodiment of the present invention also provides a computer-readable storage medium that stores a data quantization program based on a hardware environment.
  • the data quantization program based on a hardware environment is executed by a processor as described in any of the above embodiments.
  • the steps of the data quantification method of the hardware environment is executed by a processor as described in any of the above embodiments.
  • each functional module of the computer-readable storage medium in the embodiment of the present invention can be specifically implemented according to the method in the above method embodiment, and the specific implementation process can refer to the related description of the above method embodiment, and will not be repeated here.
  • the implementation of the present invention solves the problems of software package redundancy and dependency library conflicts in order to support multiple deep learning frameworks in related technologies, and can effectively streamline the workload and development difficulty of host-side software, reduce hardware computing resources, and accelerate AI accelerates card inference speed and reduces energy consumption.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种基于硬件环境的数据量化方法、装置及计算机可读存储介质。其中,方法包括解析当前深度学习框架下的模型文件得到与硬件环境无关的中间计算图数据和权重数据,并对输入数据集中的图像数据经中间计算图流程计算得到特征图数据;分别按照预先设置的线性量化方法对权重数据和每层特征图数据进行均匀量化,计算得到权重量化因子和特征图量化因子(S103),将权重量化因子和特征图量化因子进行合并,得到使硬件使用移位代替除法的量化参数;最后按照硬件需求将量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据(S105),从而解决了相关技术中为了支持多种深度学习框架导致量化软件包冗余、依赖库冲突的问题。

Description

基于硬件环境的数据量化方法、装置及可读存储介质
本申请要求于2020年1月21日提交中国国家知识产权局,申请号为202010071063.1,发明名称为“基于硬件环境的数据量化方法、装置及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种基于硬件环境的数据量化方法、装置及计算机可读存储介质。
背景技术
随着人工智能在各个领域的发展,如农业、金融、安防、健康医疗、制造等,用户对基于人工智能技术的产品的计算速度、精度和功耗有更高的需求。各大硬件产商研发专门针对人工智能算法计算的加速卡及相应配套的量化方案,来加速人工智能算法在日常使用的普及。
AI(Artificial Intelligence,人工智能)加速卡的大规模和并行特点,导致AI加速卡的开发也极具挑战,同时还需要满足量化方案能够使用低精度运算实现类似高精度运算的算法精度。为了满足高精度数据映射到低精度、减少硬件资源开销,需提前对高精度数据进行量化以生成低精度的权重数据和量化参数文件,软件端的量化工具包的 开发即满足上述需求。
但是,随着目前深度学习框架种类增多,为AI加速卡适应各种框架下的模型增加了困难,普通的量化工具包为兼容各种框架,需提前安装多种深度学习框架软件,很容易造成主机端软件的冗余和各种依赖库的冲突。
鉴于此,如何解决为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题,是本领域技术人员需要解决的技术问题。
发明内容
本申请提供了一种基于硬件环境的数据量化方法、装置及计算机可读存储介质,解决了相关技术中为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题。
为解决上述技术问题,本发明实施例提供以下技术方案:
本发明实施例一方面提供了一种基于硬件环境的数据量化方法,包括:
根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据;
基于所述中间计算图数据和所述权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据;
分别按照预先设置的线性量化方法对所述权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子;
将所述权重量化因子和所述特征图量化因子进行合并,得到量化参数,所述量化参数为使硬件使用移位代替除法的参数;
按照硬件需求将所述量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据。
可选的,所述按照硬件需求将所述量化参数和量化后的权重数据写入至bin文件之前,还包括:
对所述量化参数和量化后的权重数据进行重排序,以使所述量化参数和量化后的权重数据的数据格式为64通道并行格式。
可选的,所述解析得到当前深度学习框架的中间计算图数据和权重数据包括:
利用NNVM编译器中的NNVM组件解析所述模型文件得到所述中间计算图数据;
利用所述NNVM编译器中的TVM组件执行中间计算图的操作运算符并计算得到张量形式的权重数据。
可选的,所述将所述权重量化因子和所述特征图量化因子进行合并为:
根据量化因子合并计算关系式将所述权重量化因子和所述特征图量化因子进行合并,所述量化因子合并计算关系式为:
Figure PCTCN2020117338-appb-000001
式中,y w为所述权重量化因子,y f为所述特征图量化因子,n为所述量化参数。
可选的,所述分别按照预先设置的线性量化方法对所述权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子包括:
计算每层特征图数据的平均值,以作为每层特征图平均数据;
统计所述权重数据和每层特征图平均数据的数据分布,并计算相应的限定值;
将所述权重数据和每层特征图平均数据限定在相应限定范围内,所述限定范围根据相应限定值确定;
将限定后的数据平均量化至int8数据精度的-127~+127之间,计算得到权重量化因子和特征图量化因子。
可选的,所述计算相应的限定值包括:
所述权重数据的权重限定值根据权重限定值计算关系式计算得到,所述权重限定值计算关系式为x w=max(|w|),x w为所述权重限定 值,w为所述权重数据;相应的,所述权重数据的限定范围为(-x w,+x w);
所述每层特征图平均数据的特征图限定值根据特征图限定值计算关系式计算得到,所述特征图限定值计算关系式为x f=max(|F|),x f为所述特征图限定值,F为每层特征图平均数据;相应的,每层特征图平均数据的限定范围为(-x f,+x f)。
本发明实施例另一方面提供了一种基于硬件环境的数据量化装置,包括:
框架数据解析模块,用于根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据;
特征图数据计算模块,用于基于所述中间计算图数据和所述权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据;
线性量化模块,用于分别按照预先设置的线性量化方法对所述权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子;
量化参数计算模块,用于将所述权重量化因子和所述特征图量化因子进行合并,得到量化参数,所述量化参数为使硬件使用移位代替 除法的参数;
硬件可识别数据输出模块,用于按照硬件需求将所述量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据。
可选的,还包括重排序模块,所述重排序模块用于对所述量化参数和量化后的权重数据进行重排序,以使所述量化参数和量化后的权重数据的数据格式为64通道并行格式。
本发明实施例还提供了一种基于硬件环境的数据量化装置,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如前任一项所述基于硬件环境的数据量化方法的步骤。
本发明实施例最后还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有基于硬件环境的数据量化程序,所述基于硬件环境的数据量化程序被处理器执行时实现如前任一项所述基于硬件环境的数据量化方法的步骤。
本申请提供的技术方案的优点在于,将深度学习框架下的模型文件转化为与硬件无关的中间计算图数据和权重数据,从而可支持各种深度学习框架在不同的计算机平台上运行;采用线性量化策略,对每层特征图数据和权重数据进行均匀量化,保持最少的量化参数,同时合并量化参数利于硬件推理,数据都写入硬件能够识别的bin文件, 从而解决了相关技术由于支持多种深度学习框架带来的软件冗余、依赖库冲突问题,可有效减少为支持多种深度学习框架而开发的各种接口,精简了主机端软件的工作量和开发难度;还可减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
此外,本发明实施例还针对基于硬件环境的数据量化方法提供了相应的实现装置及计算机可读存储介质,进一步使得所述方法更具有实用性,所述装置及计算机可读存储介质具有相应的优点。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本发明实施例或相关技术的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种基于硬件环境的数据量化方法的流程示意图;
图2为本发明实施例提供的另一种基于硬件环境的数据量化方法 的流程示意图;
图3为本发明实施例提供的重排序后的数据显示示意图;
图4为本发明实施例提供的基于硬件环境的数据量化装置的一种具体实施方式结构图;
图5为本发明实施例提供的基于硬件环境的数据量化装置的另一种具体实施方式结构图;
图6为本发明实施例提供的基于硬件环境的数据量化装置的再一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系 统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
在介绍了本发明实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图1,图1为本发明实施例提供的一种基于硬件环境的数据量化方法的流程示意图,本发明实施例可包括以下内容:
S101:根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据。
在本申请中,深度学习框架可为任何一种目前现有的深度学习框架,加载深度学习框架下的模型文件如tensorflow框架下的pb文件。可利用任何一种现有方法解析得到中间计算图和权重数据,这均不影响本申请的实现,例如可利用NNVM编译器的NNVM组件将不同框架模型文件转成框架无关的计算中间图,再使用TVM组件执行中间图的操作运算符,解除计算图的各种操作运算与硬件的无关性,使得本申请可支持各种深度学习框架和在不同的计算机平台上运行。NNVM编译器包括基于TVM堆栈中的两个组件,处理中间计算图的NNVM组件和处理张量操作运算符的TVM组件。NNVM组件(计算图媒介表示堆栈)可用于将来自不同框架的工作命令表示为标准化的 计算图,然后将这些高级计算图转换为执行图。将中间计算图以一种与框架无关的形式表达出来的想法。TVM组件(张量媒介表示堆栈)的执行对象是计算图中的操作运算符,它把操作运算符优化成对应目标后端硬件的操作运算符。它与NNVM组件不同,提供了一种与硬件无关的、对应特定领域的语言,以简化在张量索引级别中的操作符执行。
S102:基于中间计算图数据和权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据。
其中,输入数据集可为S101相应深度学习框架下的训练数据集,输入数据集包含的图像总数本申请不作任何限定,例如可为包含2000张图像的数据集。在获取得到输入数据集,为了便于后续图像处理,还可对输入数据集中的图像数据进行图像预处理,图像预处理例如可先进行图层处理,然后将图像数据均统一转化为float类型数据,最后还可进行平移处理,平移值可为0-255之间的任何一个值。在TVM框架的基础操作下,可对输入图像数据算出计算图每层的输出数据,也即得到每层特征图数据,可将计算得到的每层特征图数据保存在内存中,累加计算结果,然后计算每特征图数据的平均值。
S103:分别按照预先设置的线性量化方法对权重数据和每层特征 图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子。
在本申请中,可采用任何一种线性量化方法对数据进行量化处理,本申请对此不做任何限定。举例来说,对于AI加速卡使用int8数据精度代替float数据精度,采用针对每层数据的线性量化方法,统计每层特征图数据和权重数据分布,将数据限定在-X~+X之间,再平均量化到int8的-127~+127之间,在硬件计算推理过程中量化参数合到一个,并近似为硬件能够使用移位代替除法的参数。
其中,权重量化因子和特征图量化因子根据相应的线性量化方法和原始数据计算得到,此处的,原始数据为指权重数据或每层特征图数据。
S104:将权重量化因子和特征图量化因子进行合并,得到量化参数,量化参数为使硬件使用移位代替除法的参数。
在本发明实施例中,在S103根据线性量化方法统计计算图每层输出数据和权重数据分布,计算得到合理的量化参数,最终得到的量化参数可使硬件在进行推理时使用移位代替除法,量化参数例如可为近似为2的倍数,也即量化参数作为硬件推理的移位参数。可为应用于任何一种硬件中,例如FPGA。
S105:按照硬件需求将量化参数和量化后的权重数据写入至bin 文件,生成量化后文件数据。
可以理解的是,本申请基于硬件环境,为了实现硬件可以识别这些数据,并在数据推理时使用,可将按照硬件需求将量化参数和量化后的权重数据写入至硬件能够识别的bin文件中。
在本发明实施例提供的技术方案中,将深度学习框架下的模型文件转化为与硬件无关的中间计算图数据和权重数据,从而可支持各种深度学习框架在不同的计算机平台上运行;采用线性量化策略,对每层特征图数据和权重数据进行均匀量化,保持最少的量化参数,同时合并量化参数利于硬件推理,数据都写入硬件能够识别的bin文件,从而解决了相关技术由于支持多种深度学习框架带来的软件冗余、依赖库冲突问题,可有效减少为支持多种深度学习框架而开发的各种接口,精简了主机端软件的工作量和开发难度;还可减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
此外,本申请还提供了另外一个实施例,请参见图2,图2为本发明实施例提供的另一种基于硬件环境的数据量化方法的流程示意图,本发明实施例例如可应用于基于FPGA(Field-Programmable Gate Array现场可编程门阵列)的AI加速卡在int8数据精度的量化,具体的可包括以下内容:
S201:利用NNVM编译器解析当前深度学习框架下的模型文件得到与硬件环境无关的中间计算图数据和权重数据。
在该步骤中,可利用NNVM编译器中的NNVM组件解析模型文件得到中间计算图数据;利用NNVM编译器中的TVM组件执行中间计算图的操作运算符并计算得到张量形式的权重数据,从而得到与硬件毫无关系的数据,不用受限于所使用的硬件环境。
S202:基于中间计算图数据和权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据,计算每层特征图数据的平均值,以作为每层特征图平均数据。
S203:统计权重数据和每层特征图平均数据的数据分布,并计算相应的限定值。
具体地,权重数据的权重限定值可根据权重限定值计算关系式计算得到,权重限定值计算关系式为x w=max(|w|),x w为权重限定值,w为权重数据。每层特征图平均数据的特征图限定值可根据特征图限定值计算关系式计算得到,特征图限定值计算关系式为x f=max(|F|),x f为特征图限定值,F为每层特征图平均数据。
S204:将权重数据和每层特征图平均数据限定在相应限定范围内。
本发明实施例的限定范围根据相应限定值确定,基于S203计算得到的限定值,权重数据的限定范围可为(-x w,+x w);每层特征图平均数据的限定范围可为(-x f,+x f)。
S205:将限定后的数据平均量化至int8数据精度的-127~+127之间,计算得到权重量化因子和特征图量化因子。
在经过S203和S204之后,
Figure PCTCN2020117338-appb-000002
y w=x' w/127;
Figure PCTCN2020117338-appb-000003
y f=x' f/127;x' w、x' f为量化后的权重数据和特征图数据,y w为权重量化因子,y f为特征图量化因子。
S206:根据量化因子合并计算关系式将权重量化因子和特征图量化因子进行合并,量化因子合并计算关系式为:
Figure PCTCN2020117338-appb-000004
式中,y w为权重量化因子,y f为特征图量化因子,n为量化参数。
S207:对量化参数和量化后的权重数据进行重排序,以使量化参数和量化后的权重数据的数据格式为64通道并行格式。
使用FPGA开发的AI加速卡,为使硬件资源的最大化利用,便于硬件64通道并行的计算操作,量化参数和量化后的权重数据满足硬件64通道并行的策略,可对数据进行重排序,生成如图3所示二进制 的bin文件。如此在将数据输入到硬件进行推理时,无需转换数据格式,即可将数据平均分配到64并行通道中进行计算,减少硬件在数据转换上的资源使用。
S208:按照硬件需求将重排序的量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据。
本发明实施例与上述发明实施例相应的实施方法和相同的实现步骤可参阅上述实施例的描述,此处,便在赘述。
由上可知,本发明实施解决了相关技术中为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题,可有效精简主机端软件的工作量和开发难度,减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
需要说明的是,本申请中各步骤之间没有严格的先后执行顺序,只要符合逻辑上的顺序,则这些步骤可以同时执行,也可按照某种预设顺序执行,图1-图2只是一种示意方式,并不代表只能是这样的执行顺序。
本发明实施例还针对基于硬件环境的数据量化方法提供了相应的装置,进一步使得所述方法更具有实用性。其中,装置可从功能模块的角度和硬件的角度分别说明。下面对本发明实施例提供的基于硬 件环境的数据量化装置进行介绍,下文描述的基于硬件环境的数据量化装置与上文描述的基于硬件环境的数据量化方法可相互对应参照。
基于功能模块的角度,参见图4,图4为本发明实施例提供的基于硬件环境的数据量化装置在一种具体实施方式下的结构图,该装置可包括:
框架数据解析模块401,用于根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据。
特征图数据计算模块402,用于基于中间计算图数据和权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据。
线性量化模块403,用于分别按照预先设置的线性量化方法对权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子。
量化参数计算模块404,用于将权重量化因子和特征图量化因子进行合并,得到量化参数,量化参数为使硬件使用移位代替除法的参数。
硬件可识别数据输出模块405,用于按照硬件需求将量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据。
可选的,在本实施例的一些实施方式中,请参阅图5,所述装置 例如还可以包括重排序模块406,用于对量化参数和量化后的权重数据进行重排序,以使量化参数和量化后的权重数据的数据格式为64通道并行格式。
在本实施例的另一些实施方式中,所述框架数据解析模块401可具体用于利用NNVM编译器中的NNVM组件解析模型文件得到中间计算图数据;利用NNVM编译器中的TVM组件执行中间计算图的操作运算符并计算得到张量形式的权重数据。
在本实施例的其他一些实施方式中,所述线性量化模块403可包括:
平均值计算子模块,用于计算每层特征图数据的平均值,以作为每层特征图平均数据;
限定值计算子模块,用于统计权重数据和每层特征图平均数据的数据分布,并计算相应的限定值;
数据限定子模块,用于将权重数据和每层特征图平均数据限定在相应限定范围内,限定范围根据相应限定值确定;
量化子模块,用于将限定后的数据平均量化至int8数据精度的-127~+127之间,计算得到权重量化因子和特征图量化因子。
本发明实施例所述基于硬件环境的数据量化装置的各功能模块 的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施解决了相关技术中为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题,可有效精简主机端软件的工作量和开发难度,减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
上文中提到的基于硬件环境的数据量化装置是从功能模块的角度描述,进一步的,本申请还提供一种基于硬件环境的数据量化装置,是从硬件角度描述。图6为本申请实施例提供的另一种基于硬件环境的数据量化装置的结构图。如图6所示,该装置包括存储器60,用于存储计算机程序;
处理器61,用于执行计算机程序时实现如上述任一实施例提到的基于硬件环境的数据量化方法的步骤,其中计算机程序例如可利用python语言来编译实现。
其中,处理器61可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器61可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵 列)中的至少一种硬件形式来实现。处理器61也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器61可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器61还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器60可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器60还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器60至少用于存储以下计算机程序601,其中,该计算机程序被处理器61加载并执行之后,能够实现前述任一实施例公开的基于硬件环境的数据量化方法的相关步骤。另外,存储器60所存储的资源还可以包括操作系统602和数据603等,存储方式可以是短暂存储或者永久存储。其中,操作系统602可以包括Windows、Unix、Linux等。数据603可以包括但不限于测试结果对应的数据等。
在一些实施例中,基于硬件环境的数据量化装置还可包括有显示 屏62、输入输出接口63、通信接口64、电源65以及通信总线66。
本领域技术人员可以理解,图6中示出的结构并不构成对基于硬件环境的数据量化装置的限定,可以包括比图示更多或更少的组件,例如传感器67。
本发明实施例所述基于硬件环境的数据量化装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施解决了相关技术中为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题,可有效精简主机端软件的工作量和开发难度,减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
可以理解的是,如果上述实施例中的基于硬件环境的数据量化方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only  Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
基于此,本发明实施例还提供了一种计算机可读存储介质,存储有基于硬件环境的数据量化程序,所述基于硬件环境的数据量化程序被处理器执行时如上任意一实施例所述基于硬件环境的数据量化方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施解决了相关技术中为了支持多种深度学习框架导致软件包冗余、依赖库冲突的问题,可有效精简主机端软件的工作量和开发难度,减少硬件计算资源,加速AI加速卡推理速度,降低能耗。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上对本申请所提供的一种基于硬件环境的数据量化方法、装置及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (10)

  1. 一种基于硬件环境的数据量化方法,其特征在于,包括:
    根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据;
    基于所述中间计算图数据和所述权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据;
    分别按照预先设置的线性量化方法对所述权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子;
    将所述权重量化因子和所述特征图量化因子进行合并,得到量化参数,所述量化参数为使硬件使用移位代替除法的参数;
    按照硬件需求将所述量化参数和量化后的权重数据写入至bin文件,生成量化后文件数据。
  2. 根据权利要求1所述的基于硬件环境的数据量化方法,其特征在于,所述按照硬件需求将所述量化参数和量化后的权重数据写入至bin文件之前,还包括:
    对所述量化参数和量化后的权重数据进行重排序,以使所述量化参数和量化后的权重数据的数据格式为64通道并行格式。
  3. 根据权利要求2所述的基于硬件环境的数据量化方法,其特征在于,所述解析得到当前深度学习框架的中间计算图数据和权重数据包括:
    利用NNVM编译器中的NNVM组件解析所述模型文件得到所述中间计算图数据;
    利用所述NNVM编译器中的TVM组件执行中间计算图的操作运算符并计算得到张量形式的权重数据。
  4. 根据权利要求3所述的基于硬件环境的数据量化方法,其特征在于,所述将所述权重量化因子和所述特征图量化因子进行合并为:
    根据量化因子合并计算关系式将所述权重量化因子和所述特征图量化因子进行合并,所述量化因子合并计算关系式为:
    Figure PCTCN2020117338-appb-100001
    式中,y w为所述权重量化因子,y f为所述特征图量化因子,n为所述量化参数。
  5. 根据权利要求1至4任意一项所述的基于硬件环境的数据量化方法,其特征在于,所述分别按照预先设置的线性量化方法对所述 权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子包括:
    计算每层特征图数据的平均值,以作为每层特征图平均数据;
    统计所述权重数据和每层特征图平均数据的数据分布,并计算相应的限定值;
    将所述权重数据和每层特征图平均数据限定在相应限定范围内,所述限定范围根据相应限定值确定;
    将限定后的数据平均量化至int8数据精度的-127~+127之间,计算得到权重量化因子和特征图量化因子。
  6. 根据权利要求5所述的基于硬件环境的数据量化方法,其特征在于,所述计算相应的限定值包括:
    所述权重数据的权重限定值根据权重限定值计算关系式计算得到,所述权重限定值计算关系式为x w=max(|w|),x w为所述权重限定值,w为所述权重数据;相应的,所述权重数据的限定范围为(-x w,+x w);
    所述每层特征图平均数据的特征图限定值根据特征图限定值计 算关系式计算得到,所述特征图限定值计算关系式为x f=max(|F|),x f为所述特征图限定值,F为每层特征图平均数据;相应的,每层特征图平均数据的限定范围为(-x f,+x f)。
  7. 一种基于硬件环境的数据量化装置,其特征在于,包括:
    框架数据解析模块,用于根据当前深度学习框架下的模型文件解析得到与硬件环境无关的中间计算图数据和权重数据;
    特征图数据计算模块,用于基于所述中间计算图数据和所述权重数据,对输入数据集中的图像数据经中间计算图流程计算得到特征图数据;
    线性量化模块,用于分别按照预先设置的线性量化方法对所述权重数据和每层特征图数据进行均匀量化,并计算得到权重量化因子和特征图量化因子;
    量化参数计算模块,用于将所述权重量化因子和所述特征图量化因子进行合并,得到量化参数,所述量化参数为使硬件使用移位代替除法的参数;
    硬件可识别数据输出模块,用于按照硬件需求将所述量化参数和 量化后的权重数据写入至bin文件,生成量化后文件数据。
  8. 根据权利要求7所述的基于硬件环境的数据量化装置,其特征在于,还包括重排序模块,所述重排序模块用于对所述量化参数和量化后的权重数据进行重排序,以使所述量化参数和量化后的权重数据的数据格式为64通道并行格式。
  9. 一种基于硬件环境的数据量化装置,其特征在于,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1至6任一项所述基于硬件环境的数据量化方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于硬件环境的数据量化程序,所述基于硬件环境的数据量化程序被处理器执行时实现如权利要求1至6任一项所述基于硬件环境的数据量化方法的步骤。
PCT/CN2020/117338 2020-01-21 2020-11-16 基于硬件环境的数据量化方法、装置及可读存储介质 WO2021147362A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/794,110 US11748970B2 (en) 2020-01-21 2020-11-16 Hardware environment-based data quantization method and apparatus, and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010071063.1A CN111240640B (zh) 2020-01-21 2020-01-21 基于硬件环境的数据量化方法、装置及可读存储介质
CN202010071063.1 2020-01-21

Publications (1)

Publication Number Publication Date
WO2021147362A1 true WO2021147362A1 (zh) 2021-07-29

Family

ID=70871303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117338 WO2021147362A1 (zh) 2020-01-21 2020-11-16 基于硬件环境的数据量化方法、装置及可读存储介质

Country Status (3)

Country Link
US (1) US11748970B2 (zh)
CN (1) CN111240640B (zh)
WO (1) WO2021147362A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885889A (zh) * 2021-09-18 2022-01-04 苏州浪潮智能科技有限公司 一种用于量化模型部署的方法、系统、存储介质及设备
CN114444658A (zh) * 2021-12-31 2022-05-06 苏州浪潮智能科技有限公司 一种深度学习模型推理方法、系统、设备及计算机介质
CN116257218A (zh) * 2023-01-13 2023-06-13 华中科技大学 一种统计分析软件与核能程序的接口设计方法及集成系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111240640B (zh) * 2020-01-21 2022-05-10 苏州浪潮智能科技有限公司 基于硬件环境的数据量化方法、装置及可读存储介质
CN111857723B (zh) * 2020-06-29 2022-06-17 浪潮电子信息产业股份有限公司 一种参数编译方法、装置和计算机可读存储介质
CN113052258B (zh) * 2021-04-13 2024-05-31 南京大学 基于中间层特征图压缩的卷积方法、模型及计算机设备
CN116560666B (zh) * 2023-07-10 2023-09-22 上海燧原科技有限公司 基于多层级代码生成的ai前端统一计算方法、装置及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101854535A (zh) * 2009-03-31 2010-10-06 郑州大学 嵌入式视频编码器量化方法
US20110188769A1 (en) * 2008-06-27 2011-08-04 Takaaki Fuchie Image processing apparatus and image processing method
CN106485316A (zh) * 2016-10-31 2017-03-08 北京百度网讯科技有限公司 神经网络模型压缩方法以及装置
CN110363297A (zh) * 2019-07-05 2019-10-22 上海商汤临港智能科技有限公司 神经网络训练及图像处理方法、装置、设备和介质
CN110610237A (zh) * 2019-09-17 2019-12-24 普联技术有限公司 模型的量化训练方法、装置及存储介质
CN111240640A (zh) * 2020-01-21 2020-06-05 苏州浪潮智能科技有限公司 基于硬件环境的数据量化方法、装置及可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270187B2 (en) * 2017-11-07 2022-03-08 Samsung Electronics Co., Ltd Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization
CN108920177A (zh) * 2018-06-28 2018-11-30 郑州云海信息技术有限公司 深度学习模型配置文件到fpga配置文件的映射方法
CN109343978B (zh) * 2018-09-27 2020-10-20 苏州浪潮智能科技有限公司 一种深度学习分布式框架用的数据交换方法与装置
CN109460827A (zh) * 2018-11-01 2019-03-12 郑州云海信息技术有限公司 一种深度学习环境的搭建与优化方法和系统
CN109409531A (zh) * 2018-11-01 2019-03-01 广州品唯软件有限公司 一种基于序列化文件的机器学习方法、装置及设备
EP3899811A4 (en) * 2018-12-18 2022-09-28 Movidius Ltd. NERVE NETWORK COMPRESSION
US20200364552A1 (en) * 2019-05-13 2020-11-19 Baidu Usa Llc Quantization method of improving the model inference accuracy
CN110390383B (zh) * 2019-06-25 2021-04-06 东南大学 一种基于幂指数量化的深度神经网络硬件加速器
US20190391796A1 (en) * 2019-06-28 2019-12-26 Intel Corporation Control of scheduling dependencies by a neural network compiler
KR20210083935A (ko) * 2019-12-27 2021-07-07 삼성전자주식회사 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110188769A1 (en) * 2008-06-27 2011-08-04 Takaaki Fuchie Image processing apparatus and image processing method
CN101854535A (zh) * 2009-03-31 2010-10-06 郑州大学 嵌入式视频编码器量化方法
CN106485316A (zh) * 2016-10-31 2017-03-08 北京百度网讯科技有限公司 神经网络模型压缩方法以及装置
CN110363297A (zh) * 2019-07-05 2019-10-22 上海商汤临港智能科技有限公司 神经网络训练及图像处理方法、装置、设备和介质
CN110610237A (zh) * 2019-09-17 2019-12-24 普联技术有限公司 模型的量化训练方法、装置及存储介质
CN111240640A (zh) * 2020-01-21 2020-06-05 苏州浪潮智能科技有限公司 基于硬件环境的数据量化方法、装置及可读存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885889A (zh) * 2021-09-18 2022-01-04 苏州浪潮智能科技有限公司 一种用于量化模型部署的方法、系统、存储介质及设备
CN113885889B (zh) * 2021-09-18 2024-01-19 苏州浪潮智能科技有限公司 一种用于量化模型部署的方法、系统、存储介质及设备
CN114444658A (zh) * 2021-12-31 2022-05-06 苏州浪潮智能科技有限公司 一种深度学习模型推理方法、系统、设备及计算机介质
CN116257218A (zh) * 2023-01-13 2023-06-13 华中科技大学 一种统计分析软件与核能程序的接口设计方法及集成系统
CN116257218B (zh) * 2023-01-13 2024-02-02 华中科技大学 一种统计分析软件与核能程序的接口设计方法及集成系统

Also Published As

Publication number Publication date
CN111240640B (zh) 2022-05-10
US20230055313A1 (en) 2023-02-23
CN111240640A (zh) 2020-06-05
US11748970B2 (en) 2023-09-05

Similar Documents

Publication Publication Date Title
WO2021147362A1 (zh) 基于硬件环境的数据量化方法、装置及可读存储介质
US11243816B2 (en) Program execution on heterogeneous platform
US20170061279A1 (en) Updating an artificial neural network using flexible fixed point representation
CN110633153A (zh) 一种用多核处理器实现神经网络模型拆分方法及相关产品
US20160371081A1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
CN110826708B (zh) 一种用多核处理器实现神经网络模型拆分方法及相关产品
CN112149792A (zh) 用于优化机器学习模型的执行的方法和设备
WO2021000971A1 (zh) 操作数据的生成方法、装置及相关产品
Dai et al. Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment
US11275561B2 (en) Mixed precision floating-point multiply-add operation
CN113570033B (zh) 神经网络处理单元、神经网络的处理方法及其装置
CN112764893B (zh) 数据处理方法和数据处理系统
US20120284701A1 (en) Efficient conditional flow control compilation
Li et al. A Novel Memory‐Scheduling Strategy for Large Convolutional Neural Network on Memory‐Limited Devices
Matveev Opencv graph api
CN113885941A (zh) 一种奇异值分解运算实现方法、装置以及相关设备
Angerd et al. A framework for automated and controlled floating-point accuracy reduction in graphics applications on GPUs
CN115827225A (zh) 异构运算的分配方法、模型训练方法、装置、芯片、设备及介质
US11947941B2 (en) Dynamic computation offloading to graphics processing unit
CN114219091A (zh) 网络模型推理加速的方法、装置、设备及存储介质
CN114298329A (zh) 一种模型训练方法、装置、设备及存储介质
CN113570034B (zh) 处理装置、神经网络的处理方法及其装置
CN114020476B (zh) 一种作业的处理方法、设备及介质
US8997123B2 (en) Runtime modification of property names in advanced configuration and power interface (ACPI) tables
CN114896114B (zh) 计分板实现方法、装置、计分板、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915044

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20915044

Country of ref document: EP

Kind code of ref document: A1