CN110308909A

CN110308909A - For the executable program generating means and method of neural network processor

Info

Publication number: CN110308909A
Application number: CN201810257595.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2019-10-08
Anticipated expiration: 2038-03-27
Also published as: CN110308909B

Abstract

A kind of executable program generating means and method for neural network processor, wherein generating means include: source program segmentation module, receiving source file is input, according to the format in source file, the position of code segment and data segment is identified and extracted, and generates the intermediate file comprising code segment and the intermediate file comprising data segment；Data processing module handles putting for data, exports Memory Allocation information and data disposing way for inputting the intermediate file comprising data；Neural network algorithm mapping block, by the algorithm flow for being mapped as macrostatement composition with the neural network algorithm that block indicates in code, re-maps into the relevant instruction of hardware for inputting the intermediate file comprising code segment later.The present apparatus provides a kind of method for conveniently using neural network processor for user.

Description

For the executable program generating means and method of neural network processor

Technical field

Present disclosure is related to computer field, further to artificial intelligence field.

Background technique

Deep neural network algorithm is a kind of nearest popular machine learning algorithm, is widely used various Field, such as image recognition, speech recognition, natural language processing etc..Since deep neural network all obtains in various tasks Good effect, various network structures and algorithm emerge one after another, and to programming development bring challenge.And for nerve For network processing unit, due to its unique hardware configuration, and the possessed calculating of neural network algorithm run thereon is close Collection and the intensive feature of memory access, programming also just become more complicated and difficult.Currently proposed neural network processor is all It is programmed by way of handwriting instructions, this mode is very time-consuming, consumes manpower, while being also easy to appear mistake, no Easily debugging.

During implementing present disclosure, it is found by the applicant that above-mentioned exist in the prior art following problems: at neural network It manages device and lacks effective program means and efficient code generating unit, cause its programming extremely difficult, generate the code come Inefficiency, it is difficult to give full play to the advantage of neural network processor.

Summary of the invention

(1) technical problems to be solved

In view of this, present disclosure be designed to provide for neural network processor executable program generating means and Method, to solve above-described at least partly technical problem.

(2) technical solution

According to the one side of present disclosure, a kind of executable program generating means for neural network processor are provided, are wrapped It includes:

Source program divide module, receive source file be input, according to the format in source file, identify and extract code segment and The position of data segment, and generate the intermediate file comprising code segment and the intermediate file comprising data segment；

Data processing module handles putting for data, output Memory Allocation letter for inputting the intermediate file comprising data Breath and data disposing way；

Neural network algorithm mapping block, for inputting the intermediate file comprising code segment, by being indicated with block in code Neural network algorithm be mapped as macrostatement composition algorithm flow, re-map into the relevant instruction of hardware later.

Further include parallel codes generation module in further embodiment, be used for the relevant instruction of input hardware, to its into The processing and optimization of row parallelization, the program after output optimization.

It further include reorientation module, input data disposing way, Memory Allocation information and optimization in further embodiment Relative address in program after optimization is replaced with absolute address by program afterwards.

It further include machine code generation module in further embodiment, for the reorientation module will to be relocated it The character string that program translation afterwards can be identified at neural network processor.

In further embodiment, the data processing module is also used to carry out data division, by each layer of neural network Inputoutput data divided, can be put into the on-chip memory cell of neural network processor after division.

In further embodiment, the neural network algorithm mapping block includes:

Computation partition module, for by large-scale computation partition relatively small sub- computing module on a large scale；

Command mappings module, for by Algorithm mapping at the instruction in neural network processor instruction set.

In further embodiment, the sentence of neural network algorithm, block and corresponding fixed are contained in the code segment Justice.

According to the another aspect of present disclosure, a kind of method that executable program is generated using above-mentioned apparatus is also provided, comprising: Module is divided using source program, receiving source file is input, and according to the format in source file, identifies code segment and data segment Position and extraction, and generate the intermediate file comprising code segment and intermediate file comprising data segment；

Using data processing module, input includes the intermediate file of data, handles putting for data, output Memory Allocation letter Breath and data disposing way；

Using neural network algorithm mapping block, input includes the intermediate file of code segment, by being indicated in code with block Neural network algorithm be mapped as macrostatement composition algorithm flow, re-map into the relevant instruction of hardware later.

In further embodiment, further includes: use parallel codes generation module, the relevant instruction of input hardware, to it Carry out the processing and optimization of parallelization, the program after output optimization.

In further embodiment, further includes: using reorientation module, input data disposing way, Memory Allocation information With the program after optimization, the relative address in the program after optimization is replaced with into absolute address.

In further embodiment, further includes: use machine code generation module, the reorientation module will be relocated it The character string that program translation afterwards can be identified at neural network processor.

In further embodiment, further includes: data division is carried out using the data processing module, neural network is every One layer of inputoutput data is divided, and is put into the on-chip memory cell of neural network processor after division.

In further embodiment, parallelization processing and optimization in parallel codes generation module include: by simulation and/ Or inference method, adjustment statement sequence improve parallel effect.

In further embodiment, neuron is divided by multiple data blocks using data processing module, by multiple numbers According to the storage of block sequence into storage unit, in calculating process, the data block can be loaded into on-chip memory and carry out It is further to calculate

(3) beneficial effect

The executable file generating means proposed in present disclosure are mentioned specifically for designing on neural network accelerator for user For a kind of method for conveniently using neural network processor.Since neural network algorithm is close in design, and it is independent In hardware itself, user does not need that efficient neural network algorithm operating file can be generated the characteristics of considering hardware.

The executable code for capableing of efficient operation can be generated in device in present disclosure, ensure that neural network accelerator Efficiency.

Detailed description of the invention

Fig. 1 is the frame diagram of the generating means containing executable program in present disclosure embodiment.

Fig. 2 is the overall structure diagram in present disclosure embodiment comprising executable program generating means.

Fig. 3 shows the data that data processing module carries out in present disclosure embodiment and divides.

The data that Fig. 4 illustrates that present disclosure embodiment data processing module carries out put function.

Fig. 5 is the flow diagram that source file processing is carried out using present disclosure embodiment.

Fig. 6 describes the positive of the network structure of two layers of full articulamentum and calculates schematic process.

Fig. 7 shows the specific expansion of the positive calculation block in Fig. 6.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in further detail.

In present disclosure, since neural network processor has special hardware configuration, it is therefore desirable to special programming side Method is programmed it, that is, uses programming language, language is mapped to the source file that can be generated device and be understood.Fig. 1 is Present disclosure embodiment is from neural network algorithm to the generating process of neural network processor executable file.In present disclosure embodiment Programmed method include two steps, firstly, by the concept in neural network algorithm, be mapped to abstract general in programming language It reads, for example by the neuron in neural network, cynapse is mapped to the data structure in programming language；Second step, by programming language In abstract concept, the code being mapped in specific source program, such as: one piece of neuron number according to can source file data segment It is stated.

Programming language is made of one group of set rule (grammer, semantic).In present disclosure, assembler language from the following aspect by Constraint: data type (data type), sentence (statement) and block (block).

The various data structures in neural network algorithm are organized and expressed to data type, and these are counted for storing In particular hardware according to structure mapping to neural network processor.It include three kinds of data types: neuron in present disclosure (neuron), cynapse (synapse) and parameter (parameter).Wherein, neuron is a kind of Multidimensional numerical, for storing With the input and output numerical value of each layer in expression neural network algorithm.Cynapse data structure is for expressing and storing neural network algorithm In for connecting the weight and a Multidimensional numerical that certain layers (such as convolutional layer, full articulamentum) are output and input.Supplemental characteristic Structure is a kind of scalar structure, for indicating the training parameters such as the scalar data in neural network algorithm, especially learning rate.Number It is divided into dynamic data and two kinds of static data according to structure.Dynamic data refers to the size that can just obtain primary data in the process of running Data, it is therefore desirable to carry out data distribution in the process of running；Static data is before runtime, in program generation procedure just Those of primary data size data, therefore this partial data is assigned during Program Generating and finishes.

Sentence is used to express the specific implementation procedure of algorithm, and user realizes mind by writing out particular statement with particular order Through network algorithm.Sentence includes basic statement (basic statement) and macrostatement (macro statement).Basic language Sentence is corresponding with the instruction set of neural network processor, represents the most basic function that can be supported.Macrostatement is for providing language Speech is abstracted, and macrostatement is realized by macrodefinition and macro-call.Macrodefinition is used to define the specific implementation procedure of the macrostatement, It is made of basic statement.Macro-call refers to the specifically used of the macrostatement.

Block (block) is made of for expressing a specific neural network algorithm, block one section of macrostatement and basic statement. The use of block is also classified into definition and calls two parts, and block defines the assembled arrangement for passing through macrostatement and basic statement, description one The calculating process of a neural network algorithm.The calling of block is the use of the algorithm.

More than, the generating mode of source file and the composition of source file are described, illustrates this below in conjunction with attached drawing The executable program generating means for neural network processor for disclosing embodiment, are used to handle source file, lead to It crosses and source file is parsed, optimization and etc., it is processed into the executable program that can be run on neural network processor.

The executable program generating means of present disclosure be used for generates can be run on neural network processor can be performed Program.Its input is stored in a storage unit, the source file (character string) write as with certain fixed format, and exporting is one section Can continuously be can be by the instruction sequence that target processor identifies and runs, instruction sequence with binary system, the decimal system, eight into System, the storage of the arbitrary carry systems such as hexadecimal, the character string file being stored in a storage unit.Generating means may include following Module, source program divide module, data processing module and neural network algorithm mapping block.

Source program segmentation module is used to parse the section of different meanings in source file, and is given different processing modules. Specifically, it is input that segmentation module, which receives source file, according to the format in source file, the position of code segment and data segment is identified It sets, is extracted, and generate two intermediate files, one contains code segment, and one contains data segment.In code segment All sentences, block and its corresponding definition are contained, this partial code will be sent to neural network algorithm mapping mould Block；Another part is sent to data processing module, carries out the distribution of data and puts.All data are contained in data segment The statement and definition of (static and dynamic, various types).

Data processing module is for handling the distribution of memory and putting for data.The module is with the statement of all data and determines Justice is input, exports Memory Allocation information (for example including data first address, size etc.) and disposing way.Fig. 3 shows this The data partition functionality that data processing module carries out in embodiment is disclosed, each layer of input and output scale may be very big, therefore, It needs to divide these data, to enable them to be put into the on-chip memory cell of neural network processor.

The data that Fig. 4 illustrates that present disclosure embodiment data processing module carries out put function.One piece of complete data meeting It is divided into several data blocks, in figure, a neuron data block has been divided into three data blocks, these three data blocks The storage of meeting sequence is into storage unit.Before progress data are put, data are put as unit of a complete block, and Data put module and are divided into multiple data blocks, and each is handled as an independent neuron data block With put.In calculating process, such a data block can be loaded into on-chip memory and further be calculated.

Neural network algorithm mapping block is used to map by the block (stating different neural network algorithms) in source program At corresponding basic sentence (and hardware instruction set is closely related), computation partition module, command mappings module are specifically included.It calculates Division module is used for large-scale computation partition relatively small sub- computing module on a large scale.Such as by a full articulamentum algorithm Output be divided into three sections, then this algorithm is divided into three sons and calculates, and every height calculates the section of an output.Instruction is reflected Module is penetrated then by specific Algorithm mapping at the instruction in hardware instruction set (neural network processor instruction set).For example, one Full connection calculates the basic statement that can be mapped as a Matrix Multiplication vector.Code after it is mapped will be admitted to parallel optimization Module.

In some embodiments, generating means can also include parallel codes generation module, parallel codes generation module Input is to be mapped to the program of basic statement and macrostatement, is exported as the code after optimization.The module passes through simulation, the side such as reasoning Method, adjustment statement sequence reach best parallel effect.

In some embodiments, generating means can also include reorientation module, and wherein the reorientation module is used for basis The address information of Memory Allocation is replaced the first address of the corresponding data in program.

In some embodiments, generating means can also include machine code generation module, wherein the machine code generation module String of binary characters for that can identify the code translation after reorientation at machine.

Fig. 2 is the overall structure diagram in present disclosure embodiment comprising specific executable program generating means.Source first File is admitted to source program segmentation module and carries out cutting, is divided into data segment, code segment.Data segment is admitted to data processing later Device module carries out the distribution of memory and putting for data；Code segment is admitted to neural network algorithm mapping block, will be in code With the neural network algorithm that block indicates be mapped as macrostatement composition algorithm flow, re-map into the relevant finger of hardware later It enables.The output of neural network algorithm mapping block can be admitted to parallel codes generation module, carry out the processing and optimization of parallelization. The step in, parallel codes generation module can for the sequence that instructs in the specific parallel mechanism characteristic adjustment code of hardware, With being optimal.Later, the program after the optimization of the address information of data processing module output and the output of parallel codes module It is admitted in reorientation module, the relative address in program is substituted for absolute address.Finally, program is admitted to machine code generation Module, form which can identify the code translation indicated with memonic symbol at machine (such as: binary file).

In order to further describe, a specific embodiment is used below, describes entire executable program generating process.But It should be understood that the specific implementation details in the embodiment are only used for illustrating present disclosure, it is not construed as to present disclosure Restriction；In addition, wherein the embodiment may simplify or omit component or step known to the art, so as not to it is fuzzy The characteristics of present disclosure.

Fig. 5 is the flow diagram that source file processing is carried out using present disclosure embodiment, and Fig. 6 describes one and connects entirely for two layers The positive of the network structure of layer is connect to calculate.Its specific executable file generating process is as follows:

(1) source file to be sent into source file and divides module, file indicates .code (indicating code segment) according to it accordingly, .static_rw .static_ro and .dynamic (respectively static read-write, static readonly and dynamic segment) is divided At four parts.

(2) .code is partially fed to neural network algorithm mapping block.Wherein ,@block_fw_fc is mapped to specifically Calculating is macro and memory access macrostatement.Result after mapping is as shown in Figure 7.Macro parameter fc1_out, fc1_inp and the fc1_ Weight can replacement block define in parameter.

(3) program exported in step (2) is admitted to parallel optimization module and optimizes.

(4) data segment (.static_rw .static_ro .dynamic) is admitted to data processing module and is handled. Firstly, data processing module calculates the size of these data, and such as: it is included as 1024 numbers, i.e. 2048 words in fc1_inp Section.Therefore the data of 2048 bytes are distributed to fc1_inp since relative address 0, first address is then displaced to 2048, then to the Two data carry out address distribution, until all distribution terminates.These address informations are the output information of this module.

(5) data of data segment can be put according to the segment information etc. of data assertion.

(6) program after the optimization exported in the address information and step 3 exported in step 4 is admitted to reorientation module. The module can be by relative address modification at absolute address.

(7) output program of step 6 is admitted in machine code generation module, which can manage program translation at machine The document form of solution, i.e., final executable program.

More than, by present disclosure embodiment, propose a kind of executable program generating means for neural network processor And method.By using this method and device, programmers more efficient can program on neural network processor.

It, can also be in addition, each functional unit in each embodiment of present disclosure can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also be realized in the form of software program module.

If the integrated unit is realized in the form of software program module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of present disclosure substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the present invention Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.

Each functional unit/module can be hardware, for example the hardware can be circuit, including digital circuit, simulation electricity Road etc..The physics realization of hardware configuration includes but is not limited to physical device, and physical device includes but is not limited to transistor, Memristor etc..Computing module in the computing device can be any hardware processor appropriate, such as CPU, GPU, FPGA, DSP and ASIC etc..The storage unit can be any magnetic storage medium appropriate or magnetic-optical storage medium, than Such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC etc..

It should also be understood that in present disclosure, it is right, for example, " some embodiments ", " embodiment ", " one or more implementation Example " reference indicates that special characteristic can be included in the implementation of present disclosure.Similarly, it should be appreciated that in the de-scription, it is various Feature is grouped in sometimes in single embodiment, figure or its description, to simplify explanation, and helps to understand each side of present disclosure Face.However, the disclosure method be not construed as reflection present disclosure need it is more than what is be expressly recited in each claim The intention of feature.

Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects Describe in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in protection of the invention Within the scope of.

Claims

1. a kind of executable program generating means for neural network processor, characterized by comprising:

Source program divides module, and receiving source file is input, according to the format in source file, identifies and extracts code segment and data The position of section, and generate the intermediate file comprising code segment and the intermediate file comprising data segment；

Data processing module handles putting for data for inputting the intermediate file comprising data, output Memory Allocation information and Data disposing way；

Neural network algorithm mapping block, for inputting the intermediate file comprising code segment, by the mind indicated with block in code It is mapped as the algorithm flow of macrostatement composition through network algorithm, re-maps into the relevant instruction of hardware later.

2. the apparatus according to claim 1, which is characterized in that further include parallel codes generation module, be used for input hardware Relevant instruction carries out the processing and optimization of parallelization, the program after output optimization to it.

3. the apparatus of claim 2, which is characterized in that it further include reorientation module, it is input data disposing way, interior Program after depositing distribution information and optimization, replaces with absolute address for the relative address in the program after optimization.

4. device according to claim 3, which is characterized in that further include machine code generation module, for being reset described The character string that position module can identify the program translation after reorientation at neural network processor.

5. the apparatus according to claim 1, which is characterized in that the data processing module is also used to carry out data division, Each layer of inputoutput data of neural network is divided, the on piece storage that neural network processor can be put into after division is single In member.

6. the apparatus according to claim 1, which is characterized in that the neural network algorithm mapping block includes:

7. the apparatus according to claim 1, which is characterized in that contain the language of neural network algorithm in the code segment Sentence, block and corresponding definition.

8. a kind of method using the raw executable program of any described device of claim 1-7, characterized by comprising: use source Program divides module, and receiving source file is input, and according to the format in source file, identifies the position of code segment and data segment And it extracts, and generate the intermediate file comprising code segment and the intermediate file comprising data segment；

Using data processing module, input includes the intermediate file of data, handles putting for data, output Memory Allocation information and Data disposing way；

Using neural network algorithm mapping block, input includes the intermediate file of code segment, by the mind indicated with block in code It is mapped as the algorithm flow of macrostatement composition through network algorithm, re-maps into the relevant instruction of hardware later.

9. according to the method described in claim 8, it is characterized by further comprising:

Using parallel codes generation module, the relevant instruction of input hardware carries out the processing and optimization of parallelization to it, exports excellent Program after change.

10. according to the method described in claim 9, it is characterized by further comprising:

Using reorientation module, input data disposing way, Memory Allocation information and optimization after program, by the program after optimization In relative address replace with absolute address.

11. according to the method described in claim 10, it is characterized by further comprising:

Using machine code generation module, by the reorientation module by the program translation after reorientation at neural network processor The character string that can be identified.

12. according to the method described in claim 10, it is characterized by further comprising:

Data division is carried out using the data processing module, each layer of inputoutput data of neural network is divided, It is put into after division in the on-chip memory cell of neural network processor.

13. according to the method described in claim 8, it is characterized in that, containing the language of neural network algorithm in the code segment Sentence, block and corresponding definition.

14. according to the method described in claim 9, it is characterized in that, in parallel codes generation module parallelization processing and it is excellent Change includes: by simulation and/or inference method, and adjustment statement sequence improves parallel effect.

15. according to the method described in claim 9, it is characterized by further comprising:

Neuron is divided by multiple data blocks using data processing module, the storage of multiple data block orders is single to storage In member, in calculating process, the data block can be loaded into on-chip memory and further be calculated.