CN110308909B - Executable program generating device and method for neural network processor - Google Patents

Executable program generating device and method for neural network processor Download PDF

Info

Publication number
CN110308909B
CN110308909B CN201810257595.7A CN201810257595A CN110308909B CN 110308909 B CN110308909 B CN 110308909B CN 201810257595 A CN201810257595 A CN 201810257595A CN 110308909 B CN110308909 B CN 110308909B
Authority
CN
China
Prior art keywords
data
neural network
module
mapping
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810257595.7A
Other languages
Chinese (zh)
Other versions
CN110308909A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201810257595.7A priority Critical patent/CN110308909B/en
Publication of CN110308909A publication Critical patent/CN110308909A/en
Application granted granted Critical
Publication of CN110308909B publication Critical patent/CN110308909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

An executable program generating device and method for a neural network processor, wherein the generating device comprises: the source program segmentation module receives a source file as input, identifies and extracts the positions of code segments and data segments according to the format in the source file, and generates an intermediate file containing the code segments and an intermediate file containing the data segments; the data processing module is used for inputting an intermediate file containing data, processing the arrangement of the data and outputting memory allocation information and a data arrangement mode; and the neural network algorithm mapping module is used for inputting an intermediate file containing code segments, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction. The device provides a method for using the neural network processor conveniently and rapidly for users.

Description

Executable program generating device and method for neural network processor
Technical Field
The present disclosure relates to the field of computers, and further to the field of artificial intelligence.
Background
The deep neural network algorithm is a machine learning algorithm which is very popular recently, and is widely used in various fields such as image recognition, voice recognition, natural language processing, etc. Because the deep neural network achieves good effects on various tasks, various network structures and algorithm layers are endless, and challenges are brought to programming development. Because of the unique hardware architecture of the neural network processor and the computationally and memory intensive nature of the neural network algorithm running on it, programming becomes more complex and difficult. The neural network processor is programmed by handwriting instructions, which is time-consuming, labor-consuming, and easy to be wrong and difficult to debug.
In carrying out the present disclosure, the applicant has found that the following problems exist in the prior art as described above: the lack of an effective programming means and an efficient code generating device of the neural network processor makes programming very difficult, and the generated codes have low efficiency and cannot fully exert the advantages of the neural network processor.
Disclosure of Invention
First, the technical problem to be solved
It is therefore an object of the present disclosure to provide an executable program generating apparatus and method for a neural network processor, so as to solve at least some of the above-mentioned technical problems.
(II) technical scheme
According to an aspect of the present disclosure, there is provided an executable program generating apparatus for a neural network processor, including:
the source program segmentation module receives a source file as input, identifies and extracts the positions of code segments and data segments according to the format in the source file, and generates an intermediate file containing the code segments and an intermediate file containing the data segments;
the data processing module is used for inputting an intermediate file containing data, processing the arrangement of the data and outputting memory allocation information and a data arrangement mode;
and the neural network algorithm mapping module is used for inputting an intermediate file containing code segments, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction.
In a further scheme, the system further comprises a parallel code generation module, wherein the parallel code generation module is used for inputting a hardware-related instruction, carrying out parallelization processing and optimization on the hardware-related instruction, and outputting an optimized program.
In a further scheme, the system further comprises a repositioning module, wherein the repositioning module inputs the data placement mode, the memory allocation information and the optimized program, and replaces the relative address in the optimized program with the absolute address.
In a further scheme, the computer system further comprises a machine code generation module, wherein the machine code generation module is used for translating the program after relocation into a character string which can be identified by the neural network processor.
In a further scheme, the data processing module is further used for dividing data, dividing input and output data of each layer of the neural network, and putting the divided data into on-chip storage units of the neural network processor.
In a further aspect, the neural network algorithm mapping module includes:
the calculation dividing module is used for dividing large-scale calculation into sub-calculation modules with relatively small scales;
and the instruction mapping module is used for mapping the algorithm into instructions in the instruction set of the neural network processor.
In a further aspect, the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.
According to another aspect of the present disclosure, there is also provided a method for generating an executable program using the above apparatus, including: a source program segmentation module is adopted, a source file is received as input, the positions of a code segment and a data segment are identified and extracted according to the format in the source file, and an intermediate file containing the code segment and an intermediate file containing the data segment are generated;
a data processing module is adopted, an intermediate file containing data is input, the placement of the data is processed, and memory allocation information and a data placement mode are output;
and (3) inputting an intermediate file containing code segments by adopting a neural network algorithm mapping module, mapping the neural network algorithm represented by the blocks in the codes into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction.
In a further aspect, the method further includes: and a parallel code generating module is adopted, a hardware related instruction is input, parallelization processing and optimization are carried out on the hardware related instruction, and an optimized program is output.
In a further aspect, the method further includes: and adopting a repositioning module, inputting a data placement mode, memory allocation information and an optimized program, and replacing the relative address in the optimized program with an absolute address.
In a further aspect, the method further includes: and translating the relocated program into a character string which can be identified by the neural network processor by adopting a machine code generation module.
In a further aspect, the method further includes: and dividing the data by adopting the data processing module, dividing the input and output data of each layer of the neural network, and putting the divided data into an on-chip storage unit of a neural network processor.
In a further aspect, the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.
In a further aspect, the parallelization processing and optimization in the parallel code generation module includes: the parallel effect is improved by adjusting the statement sequence through a simulation and/or reasoning method.
In a further scheme, the data processing module is adopted to divide the neuron into a plurality of data blocks, the plurality of data blocks are sequentially stored in the storage unit, and in the calculation process, the data blocks are loaded into the on-chip memory for further calculation
(III) beneficial effects
The executable file generating device provided by the disclosure is specially designed for the neural network accelerator, and provides a convenient and quick method for using the neural network processor for users. Because the design clings to the neural network algorithm and is independent of the hardware, a user can generate an efficient neural network algorithm operation file without considering the characteristics of the hardware.
The device in the disclosure can generate executable codes capable of running efficiently, and the efficiency of the neural network accelerator is guaranteed.
Drawings
FIG. 1 is a block diagram of an executable program generating apparatus in an embodiment of the present disclosure. .
Fig. 2 is a schematic diagram of an overall structure of an executable program generating apparatus according to an embodiment of the disclosure.
FIG. 3 illustrates data partitioning by a data processing module in an embodiment of the present disclosure.
FIG. 4 illustrates a data placement function performed by a data processing module according to an embodiment of the present disclosure.
FIG. 5 is a flow chart illustrating source file processing using an embodiment of the present disclosure.
Fig. 6 depicts a forward computational schematic of a two-layer fully connected layer network architecture.
Fig. 7 shows a specific expansion of the forward computation block in fig. 6.
Detailed Description
The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
In the present disclosure, since the neural network processor has a special hardware structure, a special programming method is required to program it, i.e., a programming language is applied to map the language into a source file that can be understood by the generating device. FIG. 1 is a process of generating a neural network algorithm to neural network processor executable according to an embodiment of the present disclosure. The programming method in the embodiments of the present disclosure includes two steps, first, mapping concepts in a neural network algorithm to abstract concepts in a programming language, such as mapping neurons in a neural network and synapses to data structures in the programming language; second, abstract concepts in the programming language are mapped into code in a specific source program, such as: a piece of neuron data will be declared in the data section of the source file.
A programming language consists of a set of defined rules (grammatical, semantic). In this disclosure, assembly language is constrained from: data type (data type), statement (statement), and block (block).
The data types are used to store, organize and express various data structures in the neural network algorithm, as well as to map these data structures into specific hardware of the neural network processor. Three data types are included in the present disclosure: neurons, synapses, and parameters. The neuron is a multidimensional array and is used for storing and expressing the input and output values of each layer in the neural network algorithm. The synaptic data structure is used for expressing and storing the weights used for connecting the input and output of some layers (such as convolution layer and full connection layer) in the neural network algorithm, and is also a multidimensional array. The parameter data structure is a scalar structure used for representing scalar data in the neural network algorithm, especially training parameters such as learning rate. The data structure is divided into two types, dynamic data and static data. Dynamic data refers to data of which the size can be known only in the running process, so that data distribution is needed in the running process; static data is data whose data size is known during the program generation process prior to execution, and therefore this portion of data is allocated during the program generation process.
The sentences are used to express the specific execution process of the algorithm, and the user implements the neural network algorithm by writing out the specific sentences in a specific order. The sentences include basic sentences (basic sentences) and macro sentences (macro sentences). The basic statements correspond to the instruction set of the neural network processor, representing the most basic functions that can be supported. The macro statements are used to provide an abstraction of the language, and the macro statements are implemented by macro definitions and macro invocations. Macros define the specific execution process for defining the macro statement, which consists of basic statements. Macro invocation refers to the specific use of the macro statement.
Blocks (blocks) are used to express a specific neural network algorithm, and are composed of a section of macro-statements and basic statements. The use of blocks is also divided into two parts, definition and calling, wherein the definition of the blocks describes the calculation process of a neural network algorithm through the combined arrangement of macro sentences and basic sentences. The invocation of the block is the use of the algorithm.
Having described the generation manner of the source file and the composition of the source file, the executable program generation apparatus for a neural network processor according to the embodiments of the present disclosure will be specifically described with reference to the accompanying drawings, where the executable program generation apparatus is used for processing the source file, and processing the source file into an executable program that can be run on the neural network processor through the steps of parsing, optimizing, and the like the source file.
The executable program generating means of the present disclosure is for generating an executable program that can be run on a neural network processor. The input is a source file (character string) which is stored in a storage unit and written in a certain fixed format, and the output is a continuous instruction sequence which can be recognized and run by a target processor, wherein the instruction sequence can be a character string file stored in any of binary, decimal, octal, hexadecimal and the like and stored in the storage unit. The generating means may comprise a source program segmentation module, a data processing module and a neural network algorithm mapping module.
The source program segmentation module is used for analyzing the segments with different meanings in the source file and sending the segments to different processing modules. Specifically, the segmentation module receives a source file as input, identifies the locations of code segments and data segments according to a format in the source file, extracts them, and generates two intermediate files, one containing the code segments and one containing the data segments. The code segment contains all sentences, blocks and corresponding definitions, and the code segment is sent to a neural network algorithm mapping module; the other part is sent to a data processing module for data distribution and placement. The data segment contains all declarations and definitions of data (static and dynamic, various types).
The data processing module is used for processing the distribution of the memory and the placement of the data. The module takes the declaration and definition of all data as input, outputs memory allocation information (including data head address, size and the like for example), and the arrangement mode. Fig. 3 illustrates the data partitioning function performed by the data processing module in an embodiment of the present disclosure, the input/output size of each layer may be large, and therefore, it is necessary to partition the data so that they can be placed in the on-chip memory unit of the neural network processor.
FIG. 4 illustrates a data placement function performed by a data processing module according to an embodiment of the present disclosure. A complete block of data is divided into several blocks, in the figure, a block of neuronal data is divided into three blocks, which are stored sequentially in a memory unit. Before data placement, the data is placed in a complete block unit, and the data placement module divides the data into a plurality of data blocks, and processes and places each data block as an independent neuron data block. During the computation, such a block of data is loaded into on-chip memory for further computation.
The neural network algorithm mapping module is used for mapping blocks (expressing different neural network algorithms) in the source program into corresponding basic sentences (closely related to a hardware instruction set), and specifically comprises a calculation dividing module and an instruction mapping module. The computation division module is used for dividing large-scale computation into sub-computation modules with relatively small scales. For example, the output of a full join layer algorithm is divided into three segments, and the algorithm is divided into three sub-computations, each of which computes a segment of the output. The instruction mapping module maps the specific algorithm into instructions in the hardware instruction set (neural network processor instruction set). For example, a full join calculation may be mapped to a basic statement of matrix multiplication vectors. The mapped code is sent to the parallel optimization module.
In some embodiments, the generating device may further include a parallel code generating module, and the parallel code generating module may have an input of a program mapped into a basic sentence and a macro sentence and an output of the program as an optimized code. The module adjusts the statement sequence to achieve the best parallel effect through methods such as simulation, reasoning and the like.
In some embodiments, the generating device may further include a relocation module, where the relocation module is configured to replace a first address of corresponding data in the program according to address information allocated by the memory.
In some embodiments, the generating means may further comprise a machine code generating module for translating the relocated code into a binary string recognizable by the machine.
Fig. 2 is a schematic diagram of an overall structure of an apparatus for generating a specific executable program according to an embodiment of the disclosure. Firstly, a source file is sent to a source program segmentation module for segmentation and is divided into data segments and code segments. Then the data segment is sent to a data processor module for memory allocation and data placement; the code segment is sent to a neural network algorithm mapping module, and the neural network algorithm represented by the blocks in the code is mapped into an algorithm flow formed by macro sentences, and then mapped into a hardware-related instruction. The output of the neural network algorithm mapping module is sent to the parallel code generating module for parallelization processing and optimization. In this step, the parallel code generation module adjusts the order of instructions in the code for the specific parallel mechanism characteristics of the hardware to optimize. And then, the address information output by the data processing module and the optimized program output by the parallel code module are sent to the repositioning module, and the relative address in the program is replaced by the absolute address. Finally, the program is fed into a machine code generation module that translates the code represented by the mnemonic into a form (e.g., binary) recognizable by the machine.
For further description, the following describes the overall executable program generation process, using one specific embodiment. It should be understood, however, that the specific implementation details in this embodiment are presented only to illustrate the present disclosure and should not be construed as limiting the disclosure; in addition, the embodiments may simplify or omit components or steps known in the art to avoid obscuring the features of the present disclosure.
FIG. 5 is a flow chart illustrating source file processing using an embodiment of the present disclosure, and FIG. 6 depicts a forward computation of a two-layer fully connected network architecture. The specific executable file generation process is as follows:
(1) The source file is fed into a source file splitting module where the file is split into four parts according to its corresponding flags, code (representing the code segment), static_rw, static_ro and dynamic (static read-write, static read-only and dynamic data segments, respectively).
(2) The code portion is fed into a neural network algorithm mapping module. Wherein @ block_fw_fc is mapped to specific computation macros and memory macro statements. The result after mapping is shown in fig. 7. The parameters fc1_out, fc1_inp, and fc1_weight of the macro replace the parameters in the block definition.
(3) And (3) sending the program output in the step (2) to a parallel optimization module for optimization.
(4) The data segment (.static_rw,.static_ro,.dynamic) is fed to the data processing module for processing. First, the data processing module calculates the size of these data, such as: the fc1_inp is 1024 bytes, i.e., 2048 bytes. Therefore, 2048 bytes of data are allocated to fc1_inp from the relative address 0, the first address is shifted to 2048, and address allocation is performed to the second data until all allocation ends. These address information are the output information of this module.
(5) The data of the data segment is put according to the segmentation information of the data declaration and the like.
(6) The address information output in step 4 and the optimized program output in step 3 are sent to the relocation module. The module will modify the relative address to an absolute address.
(7) The output program of step 6 is fed into a machine code generation module which translates the program into a file form understandable by the machine, i.e. the final executable program.
In the above, by the embodiments of the present disclosure, an executable program generating apparatus and method for a neural network processor are presented. By using the method and apparatus, programmers can more efficiently program on the neural network processor.
In addition, each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.
The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Each functional unit/module may be hardware, such as a circuit, including a digital circuit, an analog circuit, etc. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The computing modules in the computing device may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP and ASIC, etc. The storage unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.
It should also be understood that references in the present disclosure to, for example, "some embodiments," "an embodiment," "one or more embodiments," indicate that a particular feature may be included in the practice of the present disclosure. Similarly, it should be appreciated that in the description, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of various aspects of the disclosure. This method of disclosure, however, is not to be interpreted as reflecting an intention that the disclosure requires more features than are expressly recited in each claim.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the invention thereto, but to limit the invention thereto, and any modifications, equivalents, improvements and equivalents thereof may be made without departing from the spirit and principles of the invention.

Claims (9)

1. An executable program generating apparatus for a neural network processor, comprising:
the source program segmentation module receives a source file as input, identifies and extracts the positions of code segments and data segments according to the format in the source file, and generates an intermediate file containing the code segments and an intermediate file containing the data segments;
the data processing module is used for inputting an intermediate file containing data, processing the arrangement of the data and outputting memory allocation information and a data arrangement mode;
the neural network algorithm mapping module is used for inputting an intermediate file containing a code segment, mapping the neural network algorithm represented by the block in the code into an algorithm flow formed by macro sentences, and then mapping the algorithm flow into a hardware-related instruction;
the parallel code generation module is used for inputting a hardware-related instruction, carrying out parallelization processing and optimization on the hardware-related instruction, and outputting an optimized program;
the repositioning module is used for inputting a data placement mode, memory allocation information and an optimized program and replacing a relative address in the optimized program with an absolute address;
and the machine code generation module is used for translating the relocated program into a character string which can be identified by the neural network processor.
2. The apparatus of claim 1, wherein the data processing module is further configured to divide data, divide input/output data of each layer of the neural network, and store the divided data in an on-chip memory unit of the neural network processor.
3. The apparatus of claim 1, wherein the neural network algorithm mapping module comprises:
the calculation dividing module is used for dividing large-scale calculation into sub-calculation modules with relatively small scales;
and the instruction mapping module is used for mapping the algorithm into instructions in the instruction set of the neural network processor.
4. The apparatus of claim 1, wherein the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.
5. An executable program generating method employing the apparatus of any one of claims 1-4, comprising: a source program segmentation module is adopted, a source file is received as input, the positions of a code segment and a data segment are identified and extracted according to the format in the source file, and an intermediate file containing the code segment and an intermediate file containing the data segment are generated;
a data processing module is adopted, an intermediate file containing data is input, the placement of the data is processed, and memory allocation information and a data placement mode are output;
the method comprises the steps of adopting a neural network algorithm mapping module, inputting an intermediate file containing code segments, mapping a neural network algorithm represented by blocks in codes into an algorithm flow formed by macro sentences, and then mapping into a hardware-related instruction;
a parallel code generating module is adopted, a hardware related instruction is input, parallelization processing and optimization are carried out on the hardware related instruction, and an optimized program is output;
adopting a repositioning module, inputting a data placement mode, memory allocation information and an optimized program, and replacing a relative address in the optimized program with an absolute address;
and translating the relocated program into a character string which can be identified by the neural network processor by adopting a machine code generation module.
6. The method as recited in claim 5, further comprising:
and dividing the data by adopting the data processing module, dividing the input and output data of each layer of the neural network, and putting the divided data into an on-chip storage unit of a neural network processor.
7. The method of claim 5, wherein the code segment includes statements, blocks, and corresponding definitions of a neural network algorithm.
8. The method of claim 5, wherein parallelizing and optimizing in the parallel code generation module comprises: the parallel effect is improved by adjusting the statement sequence through a simulation and/or reasoning method.
9. The method as recited in claim 5, further comprising:
the neuron is divided into a plurality of data blocks by adopting a data processing module, the plurality of data blocks are sequentially stored in a storage unit, and the data blocks are loaded into an on-chip memory for further calculation in the calculation process.
CN201810257595.7A 2018-03-27 2018-03-27 Executable program generating device and method for neural network processor Active CN110308909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810257595.7A CN110308909B (en) 2018-03-27 2018-03-27 Executable program generating device and method for neural network processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810257595.7A CN110308909B (en) 2018-03-27 2018-03-27 Executable program generating device and method for neural network processor

Publications (2)

Publication Number Publication Date
CN110308909A CN110308909A (en) 2019-10-08
CN110308909B true CN110308909B (en) 2023-08-01

Family

ID=68074163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810257595.7A Active CN110308909B (en) 2018-03-27 2018-03-27 Executable program generating device and method for neural network processor

Country Status (1)

Country Link
CN (1) CN110308909B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11775317B2 (en) * 2021-04-30 2023-10-03 International Business Machines Corporation Locate neural network performance hot spots

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6578020B1 (en) * 1999-12-07 2003-06-10 International Business Machines Corporation Method and system for converting code to executable code using neural networks implemented in a very large scale integration (VLSI) integrated circuit
US6832214B1 (en) * 1999-12-07 2004-12-14 International Business Machines Corporation Method, system, and program for converting code to executable code using neural networks implemented in a software program
CN103282891A (en) * 2010-08-16 2013-09-04 甲骨文国际公司 System and method for effective caching using neural networks
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system
CN105989408A (en) * 2015-03-18 2016-10-05 国际商业机器公司 A system and a method for mapping a neural network onto a neurosynaptic substrate
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6826550B2 (en) * 2000-12-15 2004-11-30 International Business Machines Corporation Method, system, and program for converting application program code to executable code using neural networks based on characteristics of the inputs
US10817802B2 (en) * 2016-05-07 2020-10-27 Intel Corporation Apparatus for hardware accelerated machine learning
US10101995B2 (en) * 2016-07-15 2018-10-16 Microsoft Technology Licensing, Llc Transforming data manipulation code into data workflow

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6578020B1 (en) * 1999-12-07 2003-06-10 International Business Machines Corporation Method and system for converting code to executable code using neural networks implemented in a very large scale integration (VLSI) integrated circuit
US6832214B1 (en) * 1999-12-07 2004-12-14 International Business Machines Corporation Method, system, and program for converting code to executable code using neural networks implemented in a software program
CN103282891A (en) * 2010-08-16 2013-09-04 甲骨文国际公司 System and method for effective caching using neural networks
CN105989408A (en) * 2015-03-18 2016-10-05 国际商业机器公司 A system and a method for mapping a neural network onto a neurosynaptic substrate
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于人工神经网络的基本块重排方法;张吉豫等;《北京大学学报(自然科学版)》;20101229(第01期);全文 *

Also Published As

Publication number Publication date
CN110308909A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN111338635B (en) Graph compiling method, device, equipment and storage medium for calculation graph
US11568258B2 (en) Operation method
US20230008597A1 (en) Neural network model processing method and related device
Cuomo et al. A GPU-accelerated parallel K-means algorithm
GB2601697A (en) Natural language processing using an ontology-based concept embedding model
US20210295158A1 (en) End-to-end optimization
US11921814B2 (en) Method and device for matrix multiplication optimization using vector registers
Richmond et al. From model specification to simulation of biologically constrained networks of spiking neurons
CN114461221A (en) Compiling method, compiling device, electronic device, and storage medium
CN108171328A (en) A kind of convolution algorithm method and the neural network processor based on this method
US20230206126A1 (en) Reshape and broadcast optimizations to avoid unnecessary data movement
CN110308909B (en) Executable program generating device and method for neural network processor
Silva et al. ACETONE: predictable programming framework for ML applications in safety-critical systems
CN114026571A (en) Neural network operation reordering for parallel execution
Weber et al. Brainslug: Transparent acceleration of deep learning through depth-first parallelism
Rudi et al. CodeFlow: A Code Generation System for Flash-X Orchestration Runtime
CN110308899B (en) Language source program generation method and device for neural network processor
Gou et al. Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks
Chichin et al. Capability to embed deep neural networks: Study on cpu processor in avionics context
Zeng et al. Openbmb: Big model systems for large-scale representation learning
CN113672232A (en) Program compiling method and device
US11782706B1 (en) Reconfigurable neural network processing based on subgraph recognition
CN111860825A (en) Data processing method and related product
CN111274023A (en) Data processing method, device, computer system and storage medium
CN116755714B (en) Method, device, equipment and storage medium for operating deep neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant