CN111857723B

CN111857723B - Parameter compiling method and device and computer readable storage medium

Info

Publication number: CN111857723B
Application number: CN202010604992.4A
Authority: CN
Inventors: 曹其春; 董刚; 尹文枫; 梁玲燕
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2022-06-17
Anticipated expiration: 2040-06-29
Also published as: CN111857723A

Abstract

The embodiment of the invention discloses a parameter compiling method, a device and a medium, which are used for extracting network parameters contained in each model file; converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification; distributing corresponding memory addresses for the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters; and storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode. By converting the network parameters of each model file, the model files of different frames can be converted into uniform and hardware-friendly intermediate parameters, the correlation between various operation operations of the network parameters and the hardware is eliminated, and the problems of software code redundancy, dependency library conflict and the like caused by supporting various frames are well solved. According to the method and the device, data are written into hardware in the FPGA preprocessing stage, communication between the host and the FPGA is not needed, and communication pressure between the host and the FPGA does not exist.

Description

Parameter compiling method and device and computer readable storage medium

Technical Field

The present invention relates to the field of artificial intelligence technology, and in particular, to a parameter compiling method, apparatus, and computer-readable storage medium.

Background

With the development of artificial intelligence in various fields, such as agriculture, finance, security, health care, manufacturing and the like, people hope that the algorithm can be calculated more quickly and accurately, and meanwhile, the power consumption is lower.

For a wide variety of deep learning frameworks, an algorithm developer may use multiple frameworks as development, with the workload in each framework being represented and executed in a unique manner, and thus, even a simple Convolution (constraint) operation may need to be defined in a different manner. Supporting multiple frames requires hardware-based software to be adapted for each different operation, which may lead to a bloated software design and affect the efficiency of algorithm execution.

It can be seen that how to reduce software code redundancy brought by various frameworks is a problem to be solved by those skilled in the art.

Disclosure of Invention

Embodiments of the present invention provide a parameter compiling method, apparatus, and computer-readable storage medium, which can reduce software code redundancy caused by various frameworks.

To solve the above technical problem, an embodiment of the present invention provides a parameter compiling method, including:

extracting network parameters contained in each model file;

converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification; the parameter specification comprises size information, weight information and a context operation serial number;

distributing corresponding memory addresses for the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters;

and storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode.

Optionally, the extracting network parameters included in each model file includes:

identifying the frame type corresponding to each model file according to the loaded model parameters;

and analyzing corresponding network parameters from each model file according to the network structure corresponding to each frame type.

Optionally, the allocating a corresponding memory address to each intermediate parameter according to the size information, the weight information, and the context operation serial number corresponding to each intermediate parameter includes:

sequencing each intermediate parameter according to the context operation serial number corresponding to each intermediate parameter;

calculating the input/output size and the weight size of each intermediate parameter according to the size information and the weight information corresponding to each intermediate parameter;

and setting corresponding memory addresses for each sorted intermediate parameter according to the input and output sizes and the weight of each intermediate parameter.

Optionally, the storing each intermediate parameter and the memory address corresponding to the intermediate parameter in a preset storage space according to a set manner includes:

querying a pre-established binary instruction set to obtain binary instruction streams corresponding to the intermediate parameters and the memory addresses; wherein each intermediate parameter has a hierarchy to which it belongs;

and sequentially writing the binary instruction streams corresponding to each layer into the bin file.

Optionally, after sequentially writing the binary instruction streams corresponding to the respective levels into the bin file, the method further includes:

and when a parameter calling instruction is acquired, calling a binary instruction stream matched with the hierarchy identifier carried by the parameter calling instruction from the bin file.

The embodiment of the invention also provides a parameter compiling device which comprises an extracting unit, a converting unit, a distributing unit and a storing unit;

the extraction unit is used for extracting the network parameters contained in each model file;

the conversion unit is used for converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification; the parameter specification comprises size information, weight information and a context operation serial number;

the distribution unit is used for distributing corresponding memory addresses for the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters;

and the storage unit is used for storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode.

Optionally, the extraction unit comprises an identification subunit and an analysis subunit;

the identification subunit is used for identifying the frame type corresponding to each model file according to the loaded model parameters;

and the analysis subunit is used for analyzing the corresponding network parameters from each model file according to the network structure corresponding to each frame type.

Optionally, the allocation unit includes a sorting subunit, a calculating subunit, and a setting subunit;

the sorting subunit is configured to sort each intermediate parameter according to the context operation serial number corresponding to each intermediate parameter;

the calculation subunit is configured to calculate an input/output size and a weight size of each intermediate parameter according to the size information and the weight information corresponding to each intermediate parameter;

and the setting subunit is used for setting corresponding memory addresses for each sorted intermediate parameter according to the input and output size and the weight size of each intermediate parameter.

Optionally, the storage unit includes an inquiry subunit and a write subunit;

the query subunit is configured to query a pre-established binary instruction set to obtain binary instruction streams corresponding to the intermediate parameters and the memory addresses; wherein each intermediate parameter has a hierarchy to which it belongs;

and the writing subunit is used for sequentially writing the binary instruction streams corresponding to each level into the bin file.

Optionally, the system further comprises a calling unit;

and the calling unit is used for calling a binary instruction stream matched with the hierarchy identifier carried by the parameter calling instruction from the bin file when the parameter calling instruction is acquired.

An embodiment of the present invention further provides a parameter compiling apparatus, including:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the parameter compilation method as described in any one of the above.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the parameter compiling method according to any one of the above items.

According to the technical scheme, the network parameters contained in each model file are extracted; converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification; the parameter specification comprises size information, weight information and a context operation serial number. By converting the network parameters of each model file, the model files of different frames can be converted into uniform and hardware-friendly intermediate parameters, the correlation between various operation operations of the network parameters and the hardware is eliminated, and the converted intermediate parameters can support various deep learning frames to operate various operations on a self-designed FPGA. In order to facilitate subsequent calling, corresponding memory addresses are allocated to the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters; and storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode. The technical scheme can well solve the problems of software code redundancy, dependency library conflict and the like caused by supporting various frameworks. Compared with the pipelined operation of the traditional deep learning compiler, the method writes data into hardware in the FPGA preprocessing stage, so that the communication between the host and the FPGA is not needed, and the communication pressure between the host and the FPGA does not exist.

Drawings

In order to more clearly illustrate the embodiments of the present invention, the drawings required for the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a parameter compiling method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a parameter compiling apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a hardware structure of a parameter compiling apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Next, a parameter compiling method provided by the embodiment of the invention is described in detail. Fig. 1 is a flowchart of a parameter compiling method according to an embodiment of the present invention, where the method includes:

s101: and extracting the network parameters contained in each model file.

In the embodiment of the invention, the frame type corresponding to each model file can be identified according to the loaded model parameters.

The model parameters may include a name of the model, address information of the model, and the like.

Common frame types are tensorflow, caffe, mxnet, etc. The model file under each frame type has a unique name, so that in the embodiment of the invention, the model file can be acquired according to the address information of the loaded model, and the frame type of the model file can be determined according to the name of the loaded model.

In the concrete implementation, in order to extract the network parameters more accurately and effectively, the corresponding network parameters can be analyzed from each model file according to the network structure corresponding to each frame type.

S102: and converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification.

The parameter specification may include size information, weight information, and a context operation number.

Network parameters refer to parameters required to implement various operations under the network framework. In the embodiment of the present invention, the network parameters under different frames can be converted into intermediate parameters in a unified form according to preset parameter specifications, so that the network parameters under different frame types are converted into parameters which cannot be identified by the frame type and are more suitable for a hardware end, i.e., a Field Programmable Gate Array (FPGA).

S103: and distributing corresponding memory addresses for the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters.

The sequence number of the context operation reflects the sequence of each network parameter. The size information may include an input featuremap size, an output featuremap size. The weight information may include the size of the weight data and the number of input and output channels.

In the embodiment of the invention, the intermediate parameters can be sequenced according to the context operation serial numbers corresponding to the intermediate parameters; calculating the input/output size and the weight of each intermediate parameter according to the size information and the weight information corresponding to each intermediate parameter; and setting corresponding memory addresses for the sorted intermediate parameters according to the input and output sizes and the weight sizes of the intermediate parameters.

Each intermediate parameter has its own hierarchy. Wherein, the hierarchy can include a convolutional layer, an active layer, a pooling layer, a fully connected layer, and the like.

The hierarchy to which each intermediate parameter belongs can be distinguished according to the context operation sequence number. When setting the memory address for each intermediate parameter, the memory address may be set for each hierarchy with the hierarchy to which the intermediate parameter belongs as the processing unit. According to the input and output size and the weight size corresponding to the intermediate parameter in the hierarchy, the memory space size occupied by the intermediate parameter in the hierarchy can be determined. According to the arrangement sequence of each hierarchy and the size of the memory space required to be occupied, memory addresses can be set for each hierarchy in sequence.

S104: and storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode.

In the embodiment of the invention, in order to facilitate the hardware end to rapidly identify the intermediate parameters, each intermediate parameter and the corresponding memory address thereof can be converted into a binary code form for storage.

In particular implementations, a binary instruction set may be pre-established. The binary instruction set contains binary codes corresponding to different parameters.

After the intermediate parameters and the memory addresses corresponding to the intermediate parameters are obtained, a pre-established binary instruction set can be queried to obtain a binary instruction stream corresponding to each intermediate parameter and memory address.

Considering that each intermediate parameter has its corresponding hierarchy, in practical application, the binary instruction streams corresponding to the respective hierarchies may be written into the bin file in sequence. In practical application, when a parameter calling instruction is acquired, a binary instruction stream matched with the hierarchy identifier carried by the parameter calling instruction can be called from the bin file.

The binary code designed according to the binary instruction set for each operation parameter is converted into the binary instruction stream, so that the parameters can be conveniently and quickly identified by the hardware end, the data conversion of the hardware end in software development is reduced, and the algorithm execution efficiency is effectively improved.

Fig. 2 is a schematic structural diagram of a parameter compiling apparatus according to an embodiment of the present invention, including an extracting unit 21, a converting unit 22, an allocating unit 23, and a storing unit 24;

an extracting unit 21, configured to extract network parameters included in each model file;

a conversion unit 22, configured to convert each network parameter into a corresponding intermediate parameter according to a preset parameter specification; the parameter specification comprises size information, weight information and a context operation serial number;

the allocating unit 23 is configured to allocate a corresponding memory address to each intermediate parameter according to the size information, the weight information, and the context operation serial number corresponding to each intermediate parameter;

the storage unit 24 is configured to store each intermediate parameter and the corresponding memory address thereof in a preset storage space according to a set manner.

the sequencing subunit is used for sequencing each intermediate parameter according to the context operation serial number corresponding to each intermediate parameter;

the calculation subunit is used for calculating the input/output size and the weight size of each intermediate parameter according to the size information and the weight information corresponding to each intermediate parameter;

and the setting subunit is used for setting corresponding memory addresses for the sorted intermediate parameters according to the input/output sizes and the weight sizes of the intermediate parameters.

Optionally, the storage unit comprises an inquiry subunit and a write subunit;

the query subunit is used for querying a pre-established binary instruction set to acquire the binary instruction streams corresponding to the intermediate parameters and the memory addresses; wherein each intermediate parameter has the hierarchy to which it belongs;

Optionally, the system further comprises a calling unit;

and the calling unit is used for calling the binary instruction stream matched with the hierarchy identifier carried by the parameter calling instruction from the bin file when the parameter calling instruction is acquired.

The description of the features in the embodiment corresponding to fig. 2 may refer to the related description of the embodiment corresponding to fig. 1, and is not repeated here.

According to the technical scheme, the network parameters contained in each model file are extracted; converting each network parameter into a corresponding intermediate parameter according to a preset parameter specification; the parameter specification comprises size information, weight information and a context operation serial number. By converting the network parameters of each model file, the model files of different frames can be converted into uniform and hardware-friendly intermediate parameters, the correlation between various operation operations of the network parameters and hardware is eliminated, and the converted intermediate parameters can support various operations of various deep learning frames on a self-designed FPGA. In order to facilitate subsequent calling, corresponding memory addresses are allocated to the intermediate parameters according to the size information, the weight information and the context operation serial number corresponding to the intermediate parameters; and storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode. The technical scheme can well solve the problems of software code redundancy, dependency library conflict and the like caused by supporting various frameworks. Compared with the pipelined operation of the traditional deep learning compiler, the method writes data into hardware in the FPGA preprocessing stage, so that the communication between the host and the FPGA is not needed, and the communication pressure between the host and the FPGA does not exist.

Fig. 3 is a schematic diagram of a hardware structure of a parameter compiling apparatus 30 according to an embodiment of the present invention, including:

a memory 31 for storing a computer program;

a processor 32 for executing a computer program to implement the steps of any of the parameter compilation methods described above.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of any one of the parameter compiling methods.

The present invention provides a method, an apparatus and a computer-readable storage medium for parameter compilation. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Claims

1. A method of parameter compilation, comprising:

extracting network parameters contained in each model file;

storing each intermediate parameter and the corresponding memory address thereof to a preset storage space according to a set mode;

the allocating a corresponding memory address to each intermediate parameter according to the size information, the weight information, and the context operation serial number corresponding to each intermediate parameter includes:

2. The parameter compiling method according to claim 1, wherein the extracting the network parameters included in each model file comprises:

3. The parameter compiling method according to claim 1, wherein the storing each intermediate parameter and the memory address corresponding thereto in a preset storage space according to a set manner comprises:

and sequentially writing the binary instruction streams corresponding to the levels into the bin file.

4. The parameter compiling method according to claim 3, further comprising, after the sequentially writing the binary instruction streams corresponding to the respective levels into the bin file:

5. A parameter compiling device is characterized by comprising an extracting unit, a converting unit, a distributing unit and a storing unit;

the storage unit is used for storing each intermediate parameter and the corresponding memory address thereof to a preset storage space in a set mode;

the distribution unit comprises a sorting subunit, a calculation subunit and a setting subunit;

6. The apparatus according to claim 5, wherein the extracting unit comprises an identifying subunit and a parsing subunit;

7. A parameter compiling apparatus, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the parameter compilation method as claimed in any one of claims 1 to 4.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the parameter compilation method according to any one of claims 1 to 4.