CN110308909B

CN110308909B - Executable program generating device and method for neural network processor

Info

Publication number: CN110308909B
Application number: CN201810257595.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2023-08-01
Anticipated expiration: 2038-03-27
Also published as: CN110308909A

Abstract

An executable program generating device and method for a neural network processor, wherein the generating device comprises: the source program segmentation module receives a source file as input, identifies and extracts the positions of code segments and data segments according to the format in the source file, and generates an intermediate file containing the code segments and an intermediate file containing the data segments; the data processing module is used for inputting an intermediate file containing data, processing the arrangement of the data and outputting memory allocation information and a data arrangement mode; and the neural network algorithm mapping module is used for inputting an intermediate file containing code segments, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction. The device provides a method for using the neural network processor conveniently and rapidly for users.

Description

Executable program generating device and method for neural network processor

Technical Field

The present disclosure relates to the field of computers, and further to the field of artificial intelligence.

Background

The deep neural network algorithm is a machine learning algorithm which is very popular recently, and is widely used in various fields such as image recognition, voice recognition, natural language processing, etc. Because the deep neural network achieves good effects on various tasks, various network structures and algorithm layers are endless, and challenges are brought to programming development. Because of the unique hardware architecture of the neural network processor and the computationally and memory intensive nature of the neural network algorithm running on it, programming becomes more complex and difficult. The neural network processor is programmed by handwriting instructions, which is time-consuming, labor-consuming, and easy to be wrong and difficult to debug.

In carrying out the present disclosure, the applicant has found that the following problems exist in the prior art as described above: the lack of an effective programming means and an efficient code generating device of the neural network processor makes programming very difficult, and the generated codes have low efficiency and cannot fully exert the advantages of the neural network processor.

Disclosure of Invention

First, the technical problem to be solved

It is therefore an object of the present disclosure to provide an executable program generating apparatus and method for a neural network processor, so as to solve at least some of the above-mentioned technical problems.

(II) technical scheme

According to an aspect of the present disclosure, there is provided an executable program generating apparatus for a neural network processor, including:

the source program segmentation module receives a source file as input, identifies and extracts the positions of code segments and data segments according to the format in the source file, and generates an intermediate file containing the code segments and an intermediate file containing the data segments;

the data processing module is used for inputting an intermediate file containing data, processing the arrangement of the data and outputting memory allocation information and a data arrangement mode;

and the neural network algorithm mapping module is used for inputting an intermediate file containing code segments, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction.

In a further scheme, the system further comprises a parallel code generation module, wherein the parallel code generation module is used for inputting a hardware-related instruction, carrying out parallelization processing and optimization on the hardware-related instruction, and outputting an optimized program.

In a further scheme, the system further comprises a repositioning module, wherein the repositioning module inputs the data placement mode, the memory allocation information and the optimized program, and replaces the relative address in the optimized program with the absolute address.

In a further scheme, the computer system further comprises a machine code generation module, wherein the machine code generation module is used for translating the program after relocation into a character string which can be identified by the neural network processor.

In a further scheme, the data processing module is further used for dividing data, dividing input and output data of each layer of the neural network, and putting the divided data into on-chip storage units of the neural network processor.

In a further aspect, the neural network algorithm mapping module includes:

the calculation dividing module is used for dividing large-scale calculation into sub-calculation modules with relatively small scales;

and the instruction mapping module is used for mapping the algorithm into instructions in the instruction set of the neural network processor.

In a further aspect, the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.

According to another aspect of the present disclosure, there is also provided a method for generating an executable program using the above apparatus, including: a source program segmentation module is adopted, a source file is received as input, the positions of a code segment and a data segment are identified and extracted according to the format in the source file, and an intermediate file containing the code segment and an intermediate file containing the data segment are generated;

a data processing module is adopted, an intermediate file containing data is input, the placement of the data is processed, and memory allocation information and a data placement mode are output;

and (3) inputting an intermediate file containing code segments by adopting a neural network algorithm mapping module, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction.

In a further aspect, the method further includes: and a parallel code generating module is adopted, a hardware related instruction is input, parallelization processing and optimization are carried out on the hardware related instruction, and an optimized program is output.

In a further aspect, the method further includes: and adopting a repositioning module, inputting a data placement mode, memory allocation information and an optimized program, and replacing the relative address in the optimized program with an absolute address.

In a further aspect, the method further includes: and translating the relocated program into a character string which can be identified by the neural network processor by adopting a machine code generation module.

In a further aspect, the method further includes: and dividing the data by adopting the data processing module, dividing the input and output data of each layer of the neural network, and putting the divided data into an on-chip storage unit of a neural network processor.

In a further aspect, the parallelization processing and optimization in the parallel code generation module includes: the parallel effect is improved by adjusting the statement sequence through a simulation and/or reasoning method.

In a further scheme, the data processing module is adopted to divide the neuron into a plurality of data blocks, the plurality of data blocks are sequentially stored in the storage unit, and in the calculation process, the data blocks are loaded into the on-chip memory for further calculation

(III) beneficial effects

The executable file generating device provided by the disclosure is specially designed for the neural network accelerator, and provides a convenient and quick method for using the neural network processor for users. Because the design clings to the neural network algorithm and is independent of the hardware, a user can generate an efficient neural network algorithm operation file without considering the characteristics of the hardware.

The device in the disclosure can generate executable codes capable of running efficiently, and the efficiency of the neural network accelerator is guaranteed.

Drawings

FIG. 1 is a block diagram of an executable program generating apparatus in an embodiment of the present disclosure. .

Fig. 2 is a schematic diagram of an overall structure of an executable program generating apparatus according to an embodiment of the disclosure.

FIG. 3 illustrates data partitioning by a data processing module in an embodiment of the present disclosure.

FIG. 4 illustrates a data placement function performed by a data processing module according to an embodiment of the present disclosure.

FIG. 5 is a flow chart illustrating source file processing using an embodiment of the present disclosure.

Fig. 6 depicts a forward computational schematic of a two-layer fully connected layer network architecture.

Fig. 7 shows a specific expansion of the forward computation block in fig. 6.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

In the present disclosure, since the neural network processor has a special hardware structure, a special programming method is required to program it, i.e., a programming language is applied to map the language into a source file that can be understood by the generating device. FIG. 1 is a process of generating a neural network algorithm to neural network processor executable according to an embodiment of the present disclosure. The programming method in the embodiments of the present disclosure includes two steps, first, mapping concepts in a neural network algorithm to abstract concepts in a programming language, such as mapping neurons in a neural network and synapses to data structures in the programming language; second, abstract concepts in the programming language are mapped into code in a specific source program, such as: a piece of neuron data will be declared in the data section of the source file.

A programming language consists of a set of defined rules (grammatical, semantic). In this disclosure, assembly language is constrained from: data type (data type), statement (statement), and block (block).

The data types are used to store, organize and express various data structures in the neural network algorithm, as well as to map these data structures into specific hardware of the neural network processor. Three data types are included in the present disclosure: neurons, synapses, and parameters. The neuron is a multidimensional array and is used for storing and expressing the input and output values of each layer in the neural network algorithm. The synaptic data structure is used for expressing and storing the weights used for connecting the input and output of some layers (such as convolution layer and full connection layer) in the neural network algorithm, and is also a multidimensional array. The parameter data structure is a scalar structure used for representing scalar data in the neural network algorithm, especially training parameters such as learning rate. The data structure is divided into two types, dynamic data and static data. Dynamic data refers to data of which the size can be known only in the running process, so that data distribution is needed in the running process; static data is data whose data size is known during the program generation process prior to execution, and therefore this portion of data is allocated during the program generation process.

The sentences are used to express the specific execution process of the algorithm, and the user implements the neural network algorithm by writing out the specific sentences in a specific order. The sentences include basic sentences (basic sentences) and macro sentences (macro sentences). The basic statements correspond to the instruction set of the neural network processor, representing the most basic functions that can be supported. The macro statements are used to provide an abstraction of the language, and the macro statements are implemented by macro definitions and macro invocations. Macros define the specific execution process for defining the macro statement, which consists of basic statements. Macro invocation refers to the specific use of the macro statement.

Blocks (blocks) are used to express a specific neural network algorithm, and are composed of a section of macro-statements and basic statements. The use of blocks is also divided into two parts, definition and calling, wherein the definition of the blocks describes the calculation process of a neural network algorithm through the combined arrangement of macro sentences and basic sentences. The invocation of the block is the use of the algorithm.

Having described the generation manner of the source file and the composition of the source file, the executable program generation apparatus for a neural network processor according to the embodiments of the present disclosure will be specifically described with reference to the accompanying drawings, where the executable program generation apparatus is used for processing the source file, and processing the source file into an executable program that can be run on the neural network processor through the steps of parsing, optimizing, and the like the source file.

The executable program generating means of the present disclosure is for generating an executable program that can be run on a neural network processor. The input is a source file (character string) which is stored in a storage unit and written in a certain fixed format, and the output is a continuous instruction sequence which can be recognized and run by a target processor, wherein the instruction sequence can be a character string file stored in any of binary, decimal, octal, hexadecimal and the like and stored in the storage unit. The generating means may comprise a source program segmentation module, a data processing module and a neural network algorithm mapping module.

The source program segmentation module is used for analyzing the segments with different meanings in the source file and sending the segments to different processing modules. Specifically, the segmentation module receives a source file as input, identifies the locations of code segments and data segments according to a format in the source file, extracts them, and generates two intermediate files, one containing the code segments and one containing the data segments. The code segment contains all sentences, blocks and corresponding definitions, and the code segment is sent to a neural network algorithm mapping module; the other part is sent to a data processing module for data distribution and placement. The data segment contains all declarations and definitions of data (static and dynamic, various types).

The data processing module is used for processing the distribution of the memory and the placement of the data. The module takes the declaration and definition of all data as input, outputs memory allocation information (including data head address, size and the like for example), and the arrangement mode. Fig. 3 illustrates the data partitioning function performed by the data processing module in an embodiment of the present disclosure, the input/output size of each layer may be large, and therefore, it is necessary to partition the data so that they can be placed in the on-chip memory unit of the neural network processor.

FIG. 4 illustrates a data placement function performed by a data processing module according to an embodiment of the present disclosure. A complete block of data is divided into several blocks, in the figure, a block of neuronal data is divided into three blocks, which are stored sequentially in a memory unit. Before data placement, the data is placed in a complete block unit, and the data placement module divides the data into a plurality of data blocks, and processes and places each data block as an independent neuron data block. During the computation, such a block of data is loaded into on-chip memory for further computation.

The neural network algorithm mapping module is used for mapping blocks (expressing different neural network algorithms) in the source program into corresponding basic sentences (closely related to a hardware instruction set), and specifically comprises a calculation dividing module and an instruction mapping module. The computation division module is used for dividing large-scale computation into sub-computation modules with relatively small scales. For example, the output of a full join layer algorithm is divided into three segments, and the algorithm is divided into three sub-computations, each of which computes a segment of the output. The instruction mapping module maps the specific algorithm into instructions in the hardware instruction set (neural network processor instruction set). For example, a full join calculation may be mapped to a basic statement of matrix multiplication vectors. The mapped code is sent to the parallel optimization module.

In some embodiments, the generating device may further include a parallel code generating module, and the parallel code generating module may have an input of a program mapped into a basic sentence and a macro sentence and an output of the program as an optimized code. The module adjusts the statement sequence to achieve the best parallel effect through methods such as simulation, reasoning and the like.

In some embodiments, the generating device may further include a relocation module, where the relocation module is configured to replace a first address of corresponding data in the program according to address information allocated by the memory.

In some embodiments, the generating means may further comprise a machine code generating module for translating the relocated code into a binary string recognizable by the machine.

Fig. 2 is a schematic diagram of an overall structure of an apparatus for generating a specific executable program according to an embodiment of the disclosure. Firstly, a source file is sent to a source program segmentation module for segmentation and is divided into data segments and code segments. Then the data segment is sent to a data processor module for memory allocation and data placement; the code segment is sent to a neural network algorithm mapping module, and the neural network algorithm represented by the blocks in the code is mapped into an algorithm flow formed by macro sentences, and then mapped into a hardware-related instruction. The output of the neural network algorithm mapping module is sent to the parallel code generating module for parallelization processing and optimization. In this step, the parallel code generation module adjusts the order of instructions in the code for the specific parallel mechanism characteristics of the hardware to optimize. And then, the address information output by the data processing module and the optimized program output by the parallel code module are sent to the repositioning module, and the relative address in the program is replaced by the absolute address. Finally, the program is fed into a machine code generation module that translates the code represented by the mnemonic into a form (e.g., binary) recognizable by the machine.

For further description, the following describes the overall executable program generation process, using one specific embodiment. It should be understood, however, that the specific implementation details in this embodiment are presented only to illustrate the present disclosure and should not be construed as limiting the disclosure; in addition, the embodiments may simplify or omit components or steps known in the art to avoid obscuring the features of the present disclosure.

FIG. 5 is a flow chart illustrating source file processing using an embodiment of the present disclosure, and FIG. 6 depicts a forward computation of a two-layer fully connected network architecture. The specific executable file generation process is as follows:

(1) The source file is fed into a source file splitting module where the file is split into four parts according to its corresponding flags, code (representing the code segment), static_rw, static_ro and dynamic (static read-write, static read-only and dynamic data segments, respectively).

(2) The code portion is fed into a neural network algorithm mapping module. Wherein @ block_fw_fc is mapped to specific computation macros and memory macro statements. The result after mapping is shown in fig. 7. The parameters fc1_out, fc1_inp, and fc1_weight of the macro replace the parameters in the block definition.

(3) And (3) sending the program output in the step (2) to a parallel optimization module for optimization.

(4) The data segment (.static_rw,.static_ro,.dynamic) is fed to the data processing module for processing. First, the data processing module calculates the size of these data, such as: the fc1_inp is 1024 bytes, i.e., 2048 bytes. Therefore, 2048 bytes of data are allocated to fc1_inp from the relative address 0, the first address is shifted to 2048, and address allocation is performed to the second data until all allocation ends. These address information are the output information of this module.

(5) The data of the data segment is put according to the segmentation information of the data declaration and the like.

(6) The address information output in step 4 and the optimized program output in step 3 are sent to the relocation module. The module will modify the relative address to an absolute address.

(7) The output program of step 6 is fed into a machine code generation module which translates the program into a file form understandable by the machine, i.e. the final executable program.

In the above, by the embodiments of the present disclosure, an executable program generating apparatus and method for a neural network processor are presented. By using the method and apparatus, programmers can more efficiently program on the neural network processor.

In addition, each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.

The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Each functional unit/module may be hardware, such as a circuit, including a digital circuit, an analog circuit, etc. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The computing modules in the computing device may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP and ASIC, etc. The storage unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.

It should also be understood that references in the present disclosure to, for example, "some embodiments," "an embodiment," "one or more embodiments," indicate that a particular feature may be included in the practice of the present disclosure. Similarly, it should be appreciated that in the description, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of various aspects of the disclosure. This method of disclosure, however, is not to be interpreted as reflecting an intention that the disclosure requires more features than are expressly recited in each claim.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the invention thereto, but to limit the invention thereto, and any modifications, equivalents, improvements and equivalents thereof may be made without departing from the spirit and principles of the invention.

Claims

1. An executable program generating apparatus for a neural network processor, comprising:

the neural network algorithm mapping module is used for inputting an intermediate file containing a code segment, mapping the neural network algorithm represented by the block in the code into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction;

the parallel code generation module is used for inputting a hardware-related instruction, carrying out parallelization processing and optimization on the hardware-related instruction, and outputting an optimized program;

the repositioning module is used for inputting a data placement mode, memory allocation information and an optimized program and replacing a relative address in the optimized program with an absolute address;

and the machine code generation module is used for translating the relocated program into a character string which can be identified by the neural network processor.

2. The apparatus of claim 1, wherein the data processing module is further configured to divide data, divide input/output data of each layer of the neural network, and store the divided data in an on-chip memory unit of the neural network processor.

3. The apparatus of claim 1, wherein the neural network algorithm mapping module comprises:

4. The apparatus of claim 1, wherein the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.

5. An executable program generating method employing the apparatus of any one of claims 1-4, comprising: a source program segmentation module is adopted, a source file is received as input, the positions of a code segment and a data segment are identified and extracted according to the format in the source file, and an intermediate file containing the code segment and an intermediate file containing the data segment are generated;

the method comprises the steps of adopting a neural network algorithm mapping module, inputting an intermediate file containing code segments, mapping a neural network algorithm represented by blocks in codes into an algorithm flow formed by macro sentences, and then mapping into a hardware-related instruction;

a parallel code generating module is adopted, a hardware related instruction is input, parallelization processing and optimization are carried out on the hardware related instruction, and an optimized program is output;

adopting a repositioning module, inputting a data placement mode, memory allocation information and an optimized program, and replacing a relative address in the optimized program with an absolute address;

and translating the relocated program into a character string which can be identified by the neural network processor by adopting a machine code generation module.

6. The method as recited in claim 5, further comprising:

and dividing the data by adopting the data processing module, dividing the input and output data of each layer of the neural network, and putting the divided data into an on-chip storage unit of a neural network processor.

7. The method of claim 5, wherein the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.

8. The method of claim 5, wherein parallelizing and optimizing in the parallel code generation module comprises: the parallel effect is improved by adjusting the statement sequence through a simulation and/or reasoning method.

9. The method as recited in claim 5, further comprising:

the neuron is divided into a plurality of data blocks by adopting a data processing module, the plurality of data blocks are sequentially stored in a storage unit, and the data blocks are loaded into an on-chip memory for further calculation in the calculation process.