WO2017107154A1 - 一种矩阵运算的源到源转换方法及源到源编译器 - Google Patents
一种矩阵运算的源到源转换方法及源到源编译器 Download PDFInfo
- Publication number
- WO2017107154A1 WO2017107154A1 PCT/CN2015/098781 CN2015098781W WO2017107154A1 WO 2017107154 A1 WO2017107154 A1 WO 2017107154A1 CN 2015098781 W CN2015098781 W CN 2015098781W WO 2017107154 A1 WO2017107154 A1 WO 2017107154A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- source
- level
- intermediate code
- matrix
- code
- Prior art date
Links
Images
Definitions
- the present invention relates to the field of communications, and in particular, to a source-to-source conversion method and a source-to-source compiler for matrix operations.
- the source-to-source conversion principle is as shown in FIG. 2, which includes: lexical analysis, parsing, and semantic analysis of the DSL code to obtain a corresponding abstract syntax tree (English full name: abstract syntax tree) , English abbreviation: AST), and then convert the AST into an intermediate code (IR 0) containing only atomic operations; IR 0 performs a downgrade library inline (English: inline) operation to get a level 1 intermediate code (IR 1); The downgraded level 1 intermediate code performs compiler optimization operations, and finally replaces the level 1 intermediate code (IR 1) with the C language (including device vector instructions) code supported by the target device according to different target devices.
- FIG. 2 includes: lexical analysis, parsing, and semantic analysis of the DSL code to obtain a corresponding abstract syntax tree (English full name: abstract syntax tree) , English abbreviation: AST), and then convert the AST into an intermediate code (IR 0) containing only atomic operations; IR 0 performs a downgrade library inline
- IR 0 performs the inline library inline operation, and in the process of obtaining the level 1 intermediate code (IR1), it needs to read the atomic operation statement that needs to be converted, and find the corresponding level 1 intermediate code block in the downgrading library, and then the code The block replaces the corresponding atomic operation statement.
- the corresponding level 1 intermediate code is a code block of a loop structure. An example of the source-to-source conversion process corresponding to the above process is shown in FIG.
- the intermediate code for degrading the library inline operation will make each atomic operation form a separate loop structure, and each loop structure is a separate "memory read-computation-memory write". Operation process, so matrix compound operation final conversion
- the code will have a large number of loop structures and redundant memory operations, resulting in increased bus traffic and system latency, making system performance unsatisfactory.
- Embodiments of the present invention provide a source-to-source conversion method and a source-to-source compiler for matrix operations, so as to at least eliminate redundant loop structures and memory read and write operations in an existing matrix operation process.
- the embodiment of the present invention adopts the following technical solutions:
- a source-to-source conversion method for matrix operations comprising:
- the first source code being a source code of a matrix operation input by a user
- the first level intermediate code is converted to a second source code supported by the target device according to the type of the target device.
- the embodiment of the present invention can obtain the corresponding AST according to the first source code of the matrix operation input for the user, the operation type of the matrix operation is obtained according to the AST, and then the matrix operation is added according to the operation type. After the 0th level intermediate code is obtained, the 0th level intermediate code is degraded in the library inline operation to obtain the 1st level intermediate code. Since the intermediate code of level 0 is the intermediate code after adding the corresponding operation attribute to the matrix operation, and the plurality of operation attributes of the matrix can interact to form a merged attribute, the operation of multiple steps represented by the matrix can be solved into a single step operation.
- the inline degradation process can be optimized by using the operation attribute information, and the associated calculations are placed in the same loop structure, thereby avoiding the prior art.
- the intermediate AST is converted to an intermediate code containing atomic operations (IR 0); when the IR 0 is degraded in the library, the intermediate code (IR 1) is obtained, and each atomic operation forms a separate loop structure.
- Each loop structure is an independent "memory read - calculation - The memory write process, so the final conversion of the matrix composite operation will result in a large number of loop structures and redundant memory operations, resulting in increased bus traffic and system delay, making the system performance unable to meet the requirements, eliminating the present
- the redundant loop structure and memory read and write operations in the matrix operation process improve the performance of the generated code.
- a source to source compiler comprising:
- a receiving unit configured to receive a first source code, where the first source code is a source code of a matrix operation input by a user;
- a parsing unit configured to parse the first source code to obtain a corresponding abstract syntax tree AST
- An obtaining unit configured to acquire an operation type of the matrix operation according to the AST
- a degrading unit configured to perform the downlevel library inlining operation on the level 0 intermediate code to obtain the level 1 intermediate code
- a conversion unit configured to convert the first level intermediate code to a second source code supported by the target device according to the type of the target device.
- the technical effects that can be obtained can refer to the source of the matrix operation performed by the source-to-source compiler in the above first aspect.
- the technical effect of the source conversion method will not be described here.
- the method may further include:
- the corresponding operation attribute is added to the matrix operation according to the operation type of the matrix operation, and after the 0th level intermediate code is obtained, the degraded expression in the degraded library is directly matched.
- the degraded library inline operation further simplifies the source-to-source conversion process of matrix operations and improves the efficiency of source-to-source conversion of matrix operations.
- the intermediate code of the 0th level is subjected to a degraded library inlining operation, and the intermediate code of the first level is obtained, which may specifically include:
- the intermediate code of the 0th level is degraded according to the degraded expression matching the intermediate code of the 0th level in the degraded library, and the intermediate code of the first level is obtained.
- the degradation in the degraded library can be directly performed.
- the expression is matched. If the matching is successful, the degraded library inlining operation is directly performed according to the degraded expression, which further simplifies the source-to-source conversion process of the matrix operation, and improves the source-to-source conversion efficiency of the matrix operation.
- the corresponding operation attribute is added to the matrix operation according to the operation type of the matrix operation, and the intermediate code of the 0th level is obtained, which may specifically include:
- the matrix operation is simplified according to the operation attribute, and the simplified matrix operation is obtained;
- the matrix operation can be simplified according to the operation attribute, and the simplified matrix operation can be converted into the 0th level intermediate code, thereby reducing redundant memory read and write operations in the generated code, and being more compact.
- the 0th level intermediate code so that when the 0th level intermediate code is degraded in the library inline operation, the operation process can be further simplified, and the source-to-source conversion efficiency of the matrix operation can be improved.
- the adding unit is further configured to perform the degrading library inlining operation on the level 0 intermediate code in the degrading unit to obtain the level 1 intermediate code, and add the operation attribute to the degraded library. In the middle, obtain the corresponding degraded expression.
- the degradation unit may be specifically configured to:
- the intermediate code of the 0th level is degraded according to the degraded expression matching the intermediate code of the 0th level in the degraded library, and the intermediate code of the first level is obtained.
- the adding unit may be specifically configured to:
- the matrix operation is simplified according to the operation attribute, and the simplified matrix operation is obtained;
- a source-to-source compiler comprising: a processor, a memory, a bus, and a communication interface;
- the memory is used to store computer execution instructions
- the processor is coupled to the memory via a bus, and when the source to source compiler is running, the processor executes the computer stored instructions stored in the memory to cause the source to source compiler to perform the first aspect as described above or A source to source conversion method of matrix operations as described in any of the alternative aspects of the first aspect.
- the technical effects that can be obtained can refer to the source-to-source of the matrix operation performed by the source-to-source compiler in the above first aspect.
- the technical effect of the conversion method will not be described here.
- a readable medium comprising computer-executed instructions
- the source-to-source compiler When the processor of the source-to-source compiler executes the computer-executed instructions, the source-to-source compiler performs source-to-source operations of the matrix operations as described in the first aspect or any of the alternative aspects of the first aspect above. Conversion method.
- Figure 1 is an existing source-to-source compilation framework diagram
- FIG. 2 is a schematic diagram of a conventional source-to-source conversion principle
- 3 is an exemplary diagram of an existing source-to-source conversion
- FIG. 4 is a schematic diagram of a source-to-source conversion principle of matrix operations according to an embodiment of the present invention.
- FIG. 5 is a schematic flowchart 1 of a source-to-source conversion method for matrix operations according to an embodiment of the present invention
- FIG. 6 is a second schematic flowchart of a source-to-source conversion method for matrix operations according to an embodiment of the present disclosure
- FIG. 7 is a schematic flowchart 3 of a source-to-source conversion method for matrix operations according to an embodiment of the present disclosure
- FIG. 8 is a schematic flowchart 4 of a source-to-source conversion method for matrix operations according to an embodiment of the present invention
- FIG. 9 is a diagram showing an example of source-to-source conversion of matrix operations according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram 1 of a source-to-source compiler according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram 2 of a source-to-source compiler according to an embodiment of the present invention.
- the words “exemplary” or “such as” are used to mean an example, an illustration, or a description. Any embodiment or design described as “example” or “such as” in this application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of the words “exemplary” or “such as” is intended to present a concept in a specific manner.
- FIG. 4 is a schematic diagram of a source-to-source conversion principle of matrix operations according to an embodiment of the present invention.
- the DSL code obtains the corresponding AST through lexical analysis, syntax analysis and semantic analysis of the parser (English: parser), and then obtains the operation type of the matrix operation according to the AST; According to the operation type of the matrix operation, the corresponding operation attribute is added to the matrix operation, and the 0th level intermediate code is obtained.
- the source-to-source conversion method of the matrix operation provided by the embodiment of the present invention is provided based on the schematic diagram of the source-to-source conversion principle of the matrix operation shown in FIG. 4, as shown in FIG. 5, including steps S501-S506:
- the source-to-source compiler receives the first source code, where the first source code is a source code of a matrix operation input by the user.
- the source code of the matrix operation may be a language including a matrix operation, such as a matlab language or a C/Java-based extended language, which is not specifically limited in the embodiment of the present invention.
- the source-to-source compiler parses the first source code to obtain a corresponding AST.
- the parsing process may include: extended support for the first source code, such as a matrix type, a generalized data type, a new operation, a subtree, or a template, etc., which is not specifically limited in this embodiment of the present invention.
- the source-to-source compiler obtains the operation type of the matrix operation according to the AST.
- the operation type of the matrix operation may specifically include: transposing, conjugate or hashing, taking the ith row of data of the matrix A, and the like, and the embodiments of the present invention are not enumerated here.
- the source-to-source compiler adds a corresponding operation attribute to the matrix operation according to the operation type of the matrix operation, and obtains the 0th-level intermediate code.
- the corresponding operation attribute is added to the matrix operation according to the operation type of the matrix operation.
- the operation type includes: the hash mode takes the i-th row data of the matrix A and the transposition, and the corresponding operation attribute can be added to the matrix operation, including: the hash method takes the ith row data of the matrix A and seeks to turn Set.
- the source-to-source compiler performs the down-level library inlining operation on the level 0 intermediate code to obtain the level 1 intermediate code.
- the degraded library inline operation is specifically used to convert one advanced representation into another low-level representation.
- the high-level representation is abstract, closer to human understanding; the low-level representation is concrete, closer to the machine carried out.
- the source-to-source compiler converts the first-level intermediate code into a second source code supported by the target device according to a type of the target device.
- the second source code is a source code that can be compiled by the compiler of the target device, and can be configured to be compiled in a compiler of the corresponding target device to generate executable code of the target device, which is not elaborated in the embodiment of the present invention.
- the existing implementation refers to the existing implementation.
- the embodiment of the present invention can obtain the corresponding AST according to the first source code of the matrix operation input for the user, the operation type of the matrix operation is obtained according to the AST. Then, according to the operation type, the corresponding operation attribute is added to the matrix operation, and after the 0th level intermediate code is obtained, the 0th level intermediate code is degraded in the library inline operation to obtain the first level intermediate code. Since the intermediate code of level 0 is the intermediate code after adding the corresponding operation attribute to the matrix operation, and the plurality of operation attributes of the matrix can interact to form a merged attribute, the operation of multiple steps represented by the matrix can be solved into a single step operation.
- the inline degradation process can be optimized by using the operation attribute information, and the associated calculations are placed in the same loop structure, thereby avoiding the prior art.
- the intermediate AST is converted to an intermediate code containing atomic operations (IR 0); when the IR 0 is degraded in the library, the intermediate code (IR 1) is obtained, and each atomic operation forms a separate loop structure.
- Each loop structure is an independent "memory read-computing-memory write" operation, so the final conversion of the matrix composite operation will result in a large number of loop structures and redundant memory operations, resulting in bus traffic and system time. The increase in the delay makes the system performance unable to meet the requirements, eliminating the redundant loop structure and the internal matrix operation process. Read and write operations, to enhance the performance of the generated code.
- step S505 after the source-to-source compiler performs the down-level library inlining operation on the level 0 intermediate code to obtain the level 1 intermediate code (step S505), include:
- the source-to-source compiler adds the operation attribute to the degraded library to obtain a corresponding degraded expression.
- the source-to-source compiler receives the source code containing the matrix operation for the first time, After the source-to-source compiler degrades the level 0 intermediate code into the library inline operation, and obtains the level 1 intermediate code, the operation attribute is added to the degraded library to obtain the corresponding degraded expression.
- the corresponding operation attribute is added to the matrix operation according to the operation type of the matrix operation, and after the 0th level intermediate code is obtained, the degraded expression in the degraded library is directly matched.
- the degraded library inline operation further simplifies the source-to-source conversion process of matrix operations and improves the efficiency of source-to-source conversion of matrix operations.
- the source-to-source compiler performs the degraded library inlining operation on the 0th level intermediate code to obtain the first level intermediate code ( Step S505), specifically, the method may include:
- the S505b1, source-to-source compiler matches the level 0 intermediate code with the degraded expression in the degraded library.
- the source-to-source compiler performs a degraded library inlining operation on the 0th-level intermediate code according to the degraded expression matching the 0th-level intermediate code in the degraded library, to obtain the first-level intermediate code.
- the source-to-source compiler when the source-to-source compiler does not receive the source code including the matrix operation for the first time, after obtaining the level 0 intermediate code, when the level 0 intermediate code is degraded in the library inline operation It can directly match the degraded expression in the degraded library. If the matching is successful, the indented library inline operation is directly performed according to the degraded expression, which further simplifies the source-to-source conversion process of the matrix operation, and improves the source of the matrix operation. The efficiency of the conversion to the source.
- the source-to-source compiler adds a corresponding operation attribute to the matrix operation according to the operation type of the matrix operation, and obtains a level 0 intermediate code (step S504).
- a level 0 intermediate code may include:
- the source-to-source compiler adds a corresponding operation attribute to the matrix operation according to the operation type of the matrix operation.
- the source-to-source compiler performs a simplified operation on the matrix operation according to the operation attribute, and obtains a simplified matrix operation.
- the source-to-source compiler converts the simplified matrix operation into a level 0 intermediate code.
- the implementation of the present invention can simplify the operation of the matrix operation according to the operational attributes.
- A[1,2,5,7] means that the hashing method takes the 1, 2, 5, 7 rows of the A matrix.
- TRAN indicates transposition
- the operation type corresponding to the matrix operation includes hash and two transpositions. Then, after the operation of the matrix operation is simplified according to the operation attribute, the two TRAN attributes can cancel each other, leaving only the attribute of the hash value.
- the matrix operation can be simplified according to the operation attribute, and the simplified matrix operation is converted into the 0th level intermediate code, thereby reducing redundant memory read and write operations in the generated code, thereby obtaining more
- the operation process can be further simplified, and the source-to-source conversion efficiency of the matrix operation can be improved.
- the source-to-source conversion method of the matrix operation may include the following steps S1-S5:
- the source-to-source compiler receives and parses the first source code to obtain a corresponding AST
- the operation type of the matrix operation is obtained according to the AST, and the operation type includes a hash, a transposition, a conjugate, and Seeking business.
- Va1, Vb1, Vc1, Vd1, and Ve are specifically:
- Va1 (2+2i, 2+2i, 2+2i, 2+2i);
- Vb1 (1-1i, 2-2i, 3-3i, 4-4i);
- Vc1 (1+1i, 2+2i, 4+4i, 8+8i);
- Vd1 (1+1i, 3+3i, 5+5i, 7+7i);
- Va1, Vb1, Vc1, Vd1, and Ve are specifically:
- Va1 (3+3i, 3+3i, 3+3i, 3+3i);
- Vb1 (1-1i, 2-2i, 3-3i, 4-4i);
- Vc1 (1+1i, 2+2i, 4+4i, 8+8i);
- Vd1 (2+2i, 4+4i, 6+6i, 8+8i);
- Va1, Vb1, Vc1, Vd1, and Ve are specifically:
- Va1 (6+6i, 6+6i, 6+6i, 6+6i);
- Vb1 (1-1i, 2-2i, 3-3i, 4-4i);
- Vc1 (1+1i, 2+2i, 4+4i, 8+8i);
- Vd1 (1+1i, 3+3i, 5+5i, 7+7i);
- Va1, Vb1, Vc1, Vd1, and Ve are specifically:
- Va1 (8+8i, 8+8i, 8+8i, 8+8i);
- Vb1 (1-1i, 2-2i, 3-3i, 4-4i);
- Vc1 (1+1i, 2+2i, 4+4i, 8+8i);
- Vd1 (2+2i, 4+4i, 6+6i, 8+8i);
- the embodiment of the present invention can obtain the corresponding AST according to the first source code of the matrix operation input for the user, according to the AST obtains the operation type of the matrix operation, and then adds the corresponding operation attribute to the matrix operation according to the operation type. After obtaining the 0th level intermediate code, the 0th level intermediate code is degraded in the library inline operation to obtain the middle level 1 Code.
- the intermediate code of level 0 is the intermediate code after adding the corresponding operation attribute to the matrix operation, and the plurality of operation attributes of the matrix can interact to form a merged attribute, the operation of multiple steps represented by the matrix can be solved into a single step operation. Therefore, when the level 0 intermediate code is degraded in the library inline operation, the inline degradation process can be optimized by using the operation attribute information, and the associated calculations are placed in the same loop structure, thereby avoiding the prior art.
- the intermediate AST is converted to an intermediate code containing atomic operations (IR 0); when the IR 0 is degraded in the library, the intermediate code (IR 1) is obtained, and each atomic operation forms a separate loop structure.
- Each loop structure is an independent "memory read-computing-memory write" operation, so the final conversion of the matrix composite operation will result in a large number of loop structures and redundant memory operations, resulting in bus traffic and system time.
- the increase in the delay makes the system performance unable to meet the requirements, eliminating the redundant loop structure and the internal matrix operation process. Read and write operations, to enhance the performance of the generated code.
- an embodiment of the present invention provides a source-to-source compiler 100 for performing a source-to-source conversion method of the matrix operations shown in FIG. 5 to FIG. 8 above.
- the source-to-source compiler 100 may include a unit corresponding to the corresponding step. For example, the receiving unit 1001, the parsing unit 1002, the obtaining unit 1003, the adding unit 1004, the downgrading unit 1005, and the converting unit 1006. among them,
- the receiving unit 1001 is configured to receive the first source code, where the first source code is a source code of a matrix operation input by the user.
- the parsing unit 1002 is configured to parse the first source code to obtain a corresponding AST.
- the obtaining unit 1003 is configured to acquire an operation type of the matrix operation according to the AST.
- the adding unit 1004 is configured to add a pair to the matrix operation according to the operation type of the matrix operation
- the arithmetic attribute of the class is obtained, and the intermediate code of level 0 is obtained.
- the degrading unit 1005 is configured to perform the degraded library inlining operation on the 0th level intermediate code to obtain the first level intermediate code.
- the converting unit 1006 is configured to convert the first level intermediate code to the second source code supported by the target device according to the type of the target device.
- the adding unit 1004 is further configured to perform the degrading library inlining operation on the 0th level intermediate code in the degrading unit 1005, and after obtaining the first level intermediate code, add the operation attribute to the degraded library to obtain the corresponding degraded expression. formula.
- the degradation unit 1005 is specifically configured to:
- the degraded library inlining operation is performed on the intermediate code of the 0th level according to the degraded expression matching the intermediate code of the 0th level in the degraded library, and the intermediate code of the first level is obtained.
- the adding unit 1004 is specifically configured to:
- the matrix operation is simplified according to the operation attribute, and the simplified matrix operation is obtained;
- the source-to-source compiler 100 of the embodiment of the present invention may correspond to the source-to-source compiler in the source-to-source conversion method of the matrix operation shown in FIG. 5 to FIG. 8 above, and the source of the embodiment of the present invention is The division and/or function of each unit in the source compiler 100 is to implement the source-to-source conversion method of the matrix operation shown in FIG. 5 to FIG. 8 above.
- the source of the embodiment of the present invention is The division and/or function of each unit in the source compiler 100 is to implement the source-to-source conversion method of the matrix operation shown in FIG. 5 to FIG. 8 above.
- the source-to-source compiler provided by the embodiment of the present invention may obtain a matrix according to the AST after obtaining the corresponding AST according to the first source code of the matrix operation input for the user, according to the source-to-source compiler provided by the embodiment of the present invention.
- the operation type of the operation, and then the corresponding operation attribute is added to the matrix operation according to the operation type, and after the 0th level intermediate code is obtained, the 0th level intermediate code is degraded in the library inline operation to obtain the first level intermediate code. Since the intermediate code of level 0 is the intermediate code after adding the corresponding operation attribute to the matrix operation, and the plurality of operation attributes of the matrix can interact to form a merged attribute, the operation of multiple steps represented by the matrix can be solved into a single step operation.
- the intermediate code of the 0th level is degraded in the library.
- the inline degradation process can be optimized by using the operation attribute information, and the associated calculations are placed in the same loop structure, thereby avoiding the conversion of the AST into an intermediate code containing the atomic operation (IR 0) in the prior art.
- an embodiment of the present invention provides a source-to-source compiler 110, including: a processor 1101, a memory 1102, a bus 1103, and a communication interface 1104.
- the memory 1102 is used to store computer execution instructions
- the processor 1101 is connected to the memory 1102 via a bus 1103, and when the source to source compiler 110 is running, the processor 1101 executes computer execution instructions stored in the memory 1102 to cause the source to source compiler 110.
- the source-to-source conversion method of the matrix operation as shown in FIGS. 5 to 8 is performed.
- For the source-to-source conversion method of the specific matrix operation refer to the related description in the foregoing embodiment shown in FIG. 5 to FIG. 8 , and details are not described herein again.
- the processor 1101 in the embodiment of the present invention may be a central processing unit (English name: central processing unit, English abbreviation: CPU), and may also be other general-purpose processors and digital signal processors (English full name: digital signal processing) , English abbreviation: DSP), ASIC (English full name: application specific integrated circuit, English abbreviation: ASIC), field programmable gate array (English full name: field-programmable gate array, English abbreviation: FPGA) or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the general purpose processor may be a microprocessor or the general purpose processor or any conventional processor or the like.
- the processor 1101 may also be a dedicated processor, which may include at least one of a baseband processing chip, a radio frequency processing chip, and the like. Further, the dedicated processor may also include a chip with other dedicated processing functions of the source to source compiler 110.
- the memory 1102 can include a volatile memory (English full name: volatile memory), such as a random access memory (English full name: random-access memory, English Abbreviation: RAM); the memory 1102 can also include non-volatile memory (English name: non-volatile memory), such as read-only memory (English full name: read-only memory, English abbreviation: ROM), flash memory (English) Full name: flash memory), hard disk (English full name: hard disk drive, English abbreviation: HDD) or solid state hard disk (English full name: solid-state drive, English abbreviation: SSD); in addition, the memory 1102 can also include the above types of memory combination.
- a volatile memory English full name: volatile memory
- RAM random access memory
- RAM random access memory
- non-volatile memory English name: non-volatile memory
- non-volatile memory such as read-only memory (English full name: read-only memory, English abbreviation: ROM), flash memory (English) Full name: flash memory), hard disk (English full name
- the bus 1103 can include a data bus, a power bus, a control bus, and a signal status bus. For the sake of clarity in the embodiments of the present invention, various buses are illustrated as a bus 1103 in FIG.
- Communication interface 1104 may specifically be a transceiver on source to source compiler 110.
- the transceiver may be a wireless transceiver or a wired transceiver, which is not specifically limited in this embodiment of the present invention.
- each step in the foregoing method flow shown in FIG. 5 to FIG. 8 can be implemented by the processor 1101 in hardware form executing a computer-executed instruction in the form of software stored in the memory 1102. To avoid repetition, we will not repeat them here.
- the source-to-source compiler provided by the embodiment of the present invention may obtain a matrix according to the AST after obtaining the corresponding AST according to the first source code of the matrix operation input for the user, according to the source-to-source compiler provided by the embodiment of the present invention.
- the operation type of the operation, and then the corresponding operation attribute is added to the matrix operation according to the operation type, and after the 0th level intermediate code is obtained, the 0th level intermediate code is degraded in the library inline operation to obtain the first level intermediate code. Since the intermediate code of level 0 is the intermediate code after adding the corresponding operation attribute to the matrix operation, and the plurality of operation attributes of the matrix can interact to form a merged attribute, the operation of multiple steps represented by the matrix can be solved into a single step operation.
- the inline degradation process can be optimized by using the operation attribute information, and the associated calculations are placed in the same loop structure, thereby avoiding the prior art.
- the intermediate AST is converted to an intermediate code containing atomic operations (IR 0); when the IR 0 is degraded in the library, the intermediate code (IR 1) is obtained, and each atomic operation forms a separate loop structure.
- Each loop structure is an independent "memory read-computing-memory write" operation process, so the final conversion code of the matrix compound operation will have a large number of loop structures and redundant memory operations.
- the bus traffic and system delay increase, so that the system performance can not meet the requirements, eliminating the redundant loop structure and memory read and write operations in the existing matrix operation process, and improving the performance of the generated code.
- an embodiment of the present invention provides a readable medium, including a computer executable instruction.
- the source-to-source compiler when the processor of the source-to-source compiler executes the computer to execute the instruction, the source-to-source compiler performs the source-to-source conversion method of the matrix operation shown in FIG. 5 to FIG. 8 above.
- the source-to-source conversion method of the specific matrix operation refer to the related description in the foregoing embodiment shown in FIG. 5 to FIG. 8 , and details are not described herein again.
- the above described device is only illustrated by the division of the above functional modules. In practical applications, the above functions may be assigned differently according to needs.
- the function module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the device and the unit described above refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the modules or units is only a logical function division.
- there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or It is implemented in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the instructions include a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (English: processor) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
一种矩阵运算的源到源转换方法及源到源编译器,以至少消除现有矩阵运算过程中冗余的循环结构和内存读写操作。方法包括:接收第一源代码,所述第一源代码为用户输入的矩阵运算的源代码(S501);解析所述第一源代码,获得对应的抽象语法树AST(S502);根据所述AST,获取所述矩阵运算的运算类型(S503);根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性,获得第0级中间代码(S504);将所述第0级中间代码进行降级库内联操作,得到第1级中间代码(S505);根据目标器件的类型,将所述第1级中间代码转换为所述目标器件所支持的第二源代码(S506)。本方法及编译器适用于通信相关领域。
Description
本发明涉及通信相关领域,尤其涉及一种矩阵运算的源到源转换方法及源到源编译器。
目前通信相关领域存在较多的矩阵运算,通常可以采用平台无关的领域特定语言(英文全称:domain specific languages,英文缩写:DSL)来进行描述。然而,由于DSL语言无法直接生成器件(如数字信号处理(英文全称:digital signal processing,英文缩写:DSP)芯片)上可运行的目标文件,因此需要通过源到源编译器将DSL语言转换为不同目标器件对应的源码后,才能进行目标文件的编译,如图1所示。
如何将矩阵描述的DSL源到源转换为高效的目标器件代码,一直是源到源转换的难点。现有技术中,源到源转换原理如图2所示,包含:DSL代码经过分析器(英文:parser)的词法分析、语法分析和语义分析得到对应的抽象语法树(英文全称:abstract syntax tree,英文缩写:AST),然后将AST转换为只包含原子操作的中间代码(IR 0);IR 0进行降级库内联(英文:inline)操作,得到1级中间代码(IR 1);接着对降级后的1级中间代码进行编译器优化操作,最后根据不同的目标器件,将1级中间代码(IR 1)替换为目标器件支持的C语言(包含器件矢量指令)代码。其中,IR 0进行降级库内联操作,得到1级中间代码(IR1)的过程中,需要读取需要转换的原子操作语句,在降级库中查找对应的1级中间代码块,进而将该代码块替换对应的原子操作语句,对于矩阵操作,对应的1级中间代码是一个循环结构的代码块。上述过程对应的源到源转换过程示例如图3所示。
从上述过程可以看出,中间代码进行降级库内联操作会使得每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-内存写入”的操作过程,因此矩阵复合运算最终转换
的代码会出现大量循环结构和冗余的内存操作,导致总线流量和系统时延增大,使得系统性能无法满足要求。
因此,如何消除现有矩阵运算过程中冗余的循环结构和内存读写操作,成为目前亟待解决的问题。
发明内容
本发明实施例提供一种矩阵运算的源到源转换方法及源到源编译器,以至少消除现有矩阵运算过程中冗余的循环结构和内存读写操作。
为达到上述目的,本发明实施例采用如下技术方案:
第一方面,提供一种矩阵运算的源到源转换方法,所述方法包括:
接收第一源代码,该第一源代码为用户输入的矩阵运算的源代码;
解析该第一源代码,获得对应的抽象语法树AST;
根据该AST,获取该矩阵运算的运算类型;
根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码;
将该第0级中间代码进行降级库内联操作,得到第1级中间代码;
根据目标器件的类型,将该第1级中间代码转换为目标器件所支持的第二源代码。
通过上述方法,由于本发明实施例可以在根据为用户输入的矩阵运算的第一源代码获得对应的AST之后,根据该AST获取矩阵运算的运算类型,进而根据该运算类型对该矩阵运算添加对应的运算属性,获得第0级中间代码后,将该第0级中间代码进行降级库内联操作,得到第1级中间代码。由于第0级中间代码为对矩阵运算添加对应的运算属性后的中间代码,而矩阵的多个运算属性可以相互作用形成合并属性,表现为矩阵的多个步骤的运算可以化解为单个步骤的运算,因此在将该第0级中间代码进行降级库内联操作时,可以利用运算属性信息对内联降级过程进行优化,将相关联的计算放在同一个循环结构中,从而避免了现有技术中将AST转换为包含原子操作的中间代码(IR 0);在IR 0进行降级库内联操作,得到1级中间代码(IR 1)时由于每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-
内存写入”的操作过程,因此矩阵复合运算最终转换的代码会出现大量循环结构和冗余的内存操作,导致总线流量和系统时延增大,使得系统性能无法满足要求的问题,消除了现有矩阵运算过程中冗余的循环结构和内存读写操作,提升了生成代码的性能。
第二方面,提供一种源到源编译器,该源到源编译器包括:
接收单元,用于接收第一源代码,该第一源代码为用户输入的矩阵运算的源代码;
解析单元,用于解析该第一源代码,获得对应的抽象语法树AST;
获取单元,用于根据该AST,获取该矩阵运算的运算类型;
添加单元,用于根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码;
降级单元,用于将该第0级中间代码进行降级库内联操作,得到第1级中间代码;
转换单元,用于根据目标器件的类型,将该第1级中间代码转换为目标器件所支持的第二源代码。
由于本发明实施例提供的源到源编译器可以用于执行上述第一方面的方法,因此,其所能获得的技术效果可以参考上述第一方面中源到源编译器执行的矩阵运算的源到源转换方法的技术效果,此处不再赘述。
可选的,在上述第一方面中,在将该第0级中间代码进行降级库内联操作,得到第1级中间代码之后,还可以包括:
将该运算属性添加至降级库中,获得对应的降级表达式。
这样,可以使得在后续的矩阵运算的源到源转换过程中,在根据矩阵运算的运算类型,对矩阵运算添加对应的运算属性,获得第0级中间代码之后,直接匹配降级库中的降级表达式进行降级库内联操作,进一步简化了矩阵运算的源到源转换的过程,提高了矩阵运算的源到源转换的效率。
可选的,在上述第一方面中,将该第0级中间代码进行降级库内联操作,得到第1级中间代码,具体可以包括:
将该第0级中间代码与降级库中的降级表达式进行匹配;
若匹配成功,根据降级库中与该第0级中间代码匹配的降级表达式对该第0级中间代码进行降级库内联操作,得到第1级中间代码。
通过上述方法,当不是第一次接收包含该矩阵运算的源代码时,在获得第0级中间代码之后,对第0级中间代码进行降级库内联操作时,可以直接与降级库中的降级表达式进行匹配,如果匹配成功,直接根据该降级表达式进行降级库内联操作,进一步简化了矩阵运算的源到源转换的过程,提高了矩阵运算的源到源转换的效率。
可选的,在上述第一方面中,根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码,具体可以包括:
根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性;
根据该运算属性对该矩阵运算进行简化操作,获得简化后的矩阵运算;
将该简化后的矩阵运算转换为第0级中间代码。
通过上述方法,可以根据该运算属性对该矩阵运算进行简化操作,进而将简化后的矩阵运算转换为第0级中间代码,因此可以减少生成代码中冗余的内存读写操作,得到更为简洁的第0级中间代码,从而在将第0级中间代码进行降级库内联操作时,可以进一步简化操作过程,提高矩阵运算的源到源转换的效率。
可选的,在上述第二方面中,添加单元,还用于在降级单元将该第0级中间代码进行降级库内联操作,得到第1级中间代码之后,将该运算属性添加至降级库中,获得对应的降级表达式。
可选的,在上述第二方面中,降级单元具体可以用于:
将该第0级中间代码与降级库中的降级表达式进行匹配;
若匹配成功,根据降级库中与该第0级中间代码匹配的降级表达式对该第0级中间代码进行降级库内联操作,得到第1级中间代码。
可选的,在上述第二方面中,添加单元可以具体用于:
根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性;
根据该运算属性对该矩阵运算进行简化操作,获得简化后的矩阵运算;
将该简化后的矩阵运算转换为第0级中间代码。
第三方面,提供一种源到源编译器,该源到源编译器包括:处理器、存储器、总线和通信接口;
存储器用于存储计算机执行指令,处理器与存储器通过总线连接,当源到源编译器运行时,处理器执行述存储器存储的计算机执行指令,以使源到源编译器执行如上述第一方面或者第一方面的任意一种可选方式中所述的矩阵运算的源到源转换方法。
由于本发明实施例提供的源到源编译器用于执行上述第一方面的方法,因此,其所能获得的技术效果可以参考上述第一方面中源到源编译器执行的矩阵运算的源到源转换方法的技术效果,此处不再赘述。
第四方面,提供一种可读介质,包括计算机执行指令;其中,
当源到源编译器的处理器执行所述计算机执行指令时,该源到源编译器执行如上述第一方面或者第一方面的任意一种可选方式中所述的矩阵运算的源到源转换方法。
其中,本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
图1为现有的源到源编译框架图;
图2为现有的源到源转换原理示意图;
图3为现有的源到源转换的示例图;
图4为本发明实施例提供的矩阵运算的源到源转换原理示意图;
图5为本发明实施例提供的矩阵运算的源到源转换方法流程示意图一;
图6为本发明实施例提供的矩阵运算的源到源转换方法流程示意图二;
图7为本发明实施例提供的矩阵运算的源到源转换方法流程示意图三;
图8为本发明实施例提供的矩阵运算的源到源转换方法流程示意图四;
图9为本发明实施例提供的矩阵运算的源到源转换示例图;
图10为本发明实施例提供的源到源编译器的结构示意图一;
图11为本发明实施例提供的源到源编译器的结构示意图二。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。
需要说明的是,为了便于清楚描述本发明实施例的技术方案,在本发明的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。
需要说明的是,本文中的“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“多个”是指两个或多于两个。
另外,在本发明实施例中,“示例的”、或者“比如”等词用于表示作例子、例证或说明。本申请中被描述为“示例”或“比如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例的”、或者“比如”等词旨在以具体方式呈现概念。
本发明实施例中,“的(英文:of)”,“相应的(英文:corresponding,relevant)”和“对应的(英文:corresponding)”有时可以混用,应当指出的是,在不强调其区别时,其所要表达的含义是一致的。
图4为本发明实施例提供的矩阵运算的源到源转换原理示意图。如图4所示,本发明实施例中,DSL代码经过分析器(英文:parser)的词法分析、语法分析和语义分析得到对应的AST,然后根据该AST,获取矩阵运算的运算类型;进而,根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码。然后对IR 0进行降级库内联(英文:inline)操作,得到1级中间代码(IR 1);接着对降级后的
1级中间代码进行编译器优化操作,最后根据不同的目标器件,将1级中间代码(IR 1)替换为目标器件支持的C语言(包含器件矢量指令)代码。
下面将基于图4所示的矩阵运算的源到源转换原理示意图,给出本发明实施例提供的矩阵运算的源到源转换方法,如图5所示,包括步骤S501-S506:
S501、源到源编译器接收第一源代码,该第一源代码为用户输入的矩阵运算的源代码。
具体的,本发明实施例中,矩阵运算的源代码具体可以是包含矩阵操作的语言,如类matlab语言或基于C/Java的扩展语言等,本发明实施例对此不作具体限定。
S502、源到源编译器解析第一源代码,获得对应的AST。
具体的,该解析过程可以包括:对第一源代码的扩展支持,如矩阵类型、广义数据类型、新的操作、子树或模板等,本发明实施例对此不作具体限定。
S503、源到源编译器根据该AST,获取该矩阵运算的运算类型。
具体的,矩阵运算的运算类型具体可以包括:求转置、求共轭或散列方式取矩阵A的第i行数据,等等,本发明实施例在此不再一一列举。
S504、源到源编译器根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码。
具体的,在源到源编译器获取该矩阵运算的运算类型之后,将根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性。比如,运算类型包括:散列方式取矩阵A的第i行数据以及求转置,则可以对该矩阵运算添加对应的运算属性,包括:散列方式取矩阵A的第i行数据以及求转置。
S505、源到源编译器将该第0级中间代码进行降级库内联操作,得到第1级中间代码。
具体的,本发明实施例中,降级库内联操作具体用于将一种高级的表示形式转换为另一种低级的表示形式。其中,高级的表示形式是抽象化的,更贴近于人的理解;低级的表示形式是具体的,更贴近于机器的
执行。
S506、源到源编译器根据目标器件的类型,将该第1级中间代码转换为该目标器件所支持的第二源代码。
具体的,第二源代码为目标器件的编译器能够编译的源代码,可配置在相应的目标器件的编译器中编译后生成该目标器件的可执行代码,本发明实施例对此不作详细阐述,具体可参考现有的实现方式。
基于本发明实施例提供的矩阵运算的源到源转换方法,由于本发明实施例可以在根据为用户输入的矩阵运算的第一源代码获得对应的AST之后,根据该AST获取矩阵运算的运算类型,进而根据该运算类型对该矩阵运算添加对应的运算属性,获得第0级中间代码后,将该第0级中间代码进行降级库内联操作,得到第1级中间代码。由于第0级中间代码为对矩阵运算添加对应的运算属性后的中间代码,而矩阵的多个运算属性可以相互作用形成合并属性,表现为矩阵的多个步骤的运算可以化解为单个步骤的运算,因此在将该第0级中间代码进行降级库内联操作时,可以利用运算属性信息对内联降级过程进行优化,将相关联的计算放在同一个循环结构中,从而避免了现有技术中将AST转换为包含原子操作的中间代码(IR 0);在IR 0进行降级库内联操作,得到1级中间代码(IR 1)时由于每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-内存写入”的操作过程,因此矩阵复合运算最终转换的代码会出现大量循环结构和冗余的内存操作,导致总线流量和系统时延增大,使得系统性能无法满足要求的问题,消除了现有矩阵运算过程中冗余的循环结构和内存读写操作,提升了生成代码的性能。
可选的,如图6所示,本发明实施例中,在源到源编译器将该第0级中间代码进行降级库内联操作,得到第1级中间代码(步骤S505)之后,还可以包括:
S507、源到源编译器将运算属性添加至降级库中,获得对应的降级表达式。
即,当源到源编译器第一次接收包含该矩阵运算的源代码时,可以
在源到源编译器将该第0级中间代码进行降级库内联操作,得到第1级中间代码之后,将运算属性添加至降级库中,获得对应的降级表达式。这样,可以使得在后续的矩阵运算的源到源转换过程中,在根据矩阵运算的运算类型,对矩阵运算添加对应的运算属性,获得第0级中间代码之后,直接匹配降级库中的降级表达式进行降级库内联操作,进一步简化了矩阵运算的源到源转换的过程,提高了矩阵运算的源到源转换的效率。
可选的,如图7所示,本发明实施例中,一种可能的实现方式中,源到源编译器将该第0级中间代码进行降级库内联操作,得到第1级中间代码(步骤S505),具体可以包括:
S505b1、源到源编译器将第0级中间代码与降级库中的降级表达式进行匹配。
S505b2、若匹配成功,源到源编译器根据该降级库中与第0级中间代码匹配的降级表达式对第0级中间代码进行降级库内联操作,得到第1级中间代码。
即,本发明实施例中,当源到源编译器不是第一次接收包含该矩阵运算的源代码时,在获得第0级中间代码之后,对第0级中间代码进行降级库内联操作时,可以直接与降级库中的降级表达式进行匹配,如果匹配成功,直接根据该降级表达式进行降级库内联操作,进一步简化了矩阵运算的源到源转换的过程,提高了矩阵运算的源到源转换的效率。
可选的,如图8所示,本发明实施例中,源到源编译器根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性,获得第0级中间代码(步骤S504),具体可以包括:
S504a、源到源编译器根据该矩阵运算的运算类型,对该矩阵运算添加对应的运算属性。
S504b、源到源编译器根据该运算属性对该矩阵运算进行简化操作,获得简化后的矩阵运算。
S504c、源到源编译器将简化后的矩阵运算转换为第0级中间代码。
即,本发明实施可以根据运算属性对矩阵运算进行简化操作。
比如,对于矩阵运算E=TRAN(TRAN(A[1,2,5,7])),A[1,2,5,7]表示散列方式取A矩阵的1,2,5,7行数据,TRAN表示求转置,则该矩阵运算对应的运算类型包括散列以及两次转置。则,根据运算属性对该矩阵运算进行简化操作后,两个TRAN属性可以相互抵消,只剩下散列取值的属性。
由于本发明实施例中可以根据该运算属性对该矩阵运算进行简化操作,进而将简化后的矩阵运算转换为第0级中间代码,因此可以减少生成代码中冗余的内存读写操作,得到更为简洁的第0级中间代码,从而在将第0级中间代码进行降级库内联操作时,可以进一步简化操作过程,提高矩阵运算的源到源转换的效率。
下面将结合一个具体示例对上述矩阵运算的源到源转换方法进行详细阐述。
示例性的,假设存在矩阵运算
E=TRAN(TRAN(A[1,2,5,7])).*CONJ(B)+C/2-D,其中,TRAN表示求转置,CONJ表示求共轭,TRAN(TRAN(A[1,2,5,7])).右下角的“.”表示矩阵进行点乘操作,即矩阵的对应元素与矩阵的对应元素相乘,与一般的矩阵乘相区分。矩阵A、B、C、D分别如下所示:
则矩阵运算的源到源转换方法可以包括如下步骤S1-S5:
S1、在源到源编译器接收并解析第一源代码,获得对应的AST之后,根据AST获取该矩阵运算的运算类型可得,运算类型包含求散列、求转置、求共轭、以及求商。
S2、根据矩阵运算的运算类型,对矩阵运算添加对应的运算属性可得:
A:[1,2,5,7],TRAN,TRAN;
B:CONJ;
C:1/2;
S3、根据运算属性对矩阵运算进行简化操作,获得简化后的矩阵运算。如下:
E=A1.*B1+C1-D1,其中,矩阵A1、B1、C1、D1分别如下所示:
S4、将简化后的矩阵运算转换为第0级中间代码,并将第0级中间代码进行降级库内联操作,得到第1级中间代码,可以如下:
For(i=0;i<4;i++)
{
Vector Va1,Vb1,Vc1,Vd1,Ve;
Vector Va1=load(A[1,2,5,7][i]);
Vector Vb1=CONJ(load(B[i]));
Vector Vc1=load(C[i])/2;
Vector Vd1=load(D[i]);
Vector Ve;
Ve=Va1.*Vb1+Vc1-Vd1;,
Store(Ve,E[i]);
}
其中,当i=0时,Va1、Vb1、Vc1、Vd1、Ve具体为:
Va1=(2+2i,2+2i,2+2i,2+2i);
Vb1=(1-1i,2-2i,3-3i,4-4i);
Vc1=(1+1i,2+2i,4+4i,8+8i);
Vd1=(1+1i,3+3i,5+5i,7+7i);
Ve=Va1.*Vb1+Vc1-Vd1
=(2+2i,2+2i,2+2i,2+2i).*(1-1i,2-2i,3-3i,4-4i)+(1+1i,2+2i,4+4i,8+8i)-(1+1i,3+3i,5+5i,7+7i)
=(4+0i,8+0i,12+0i,16+0i)+(1+1i,2+2i,4+4i,8+8i)-(1+1i,3+3i,5+5i,7+7i)
=(4+0i,7-1i,11-1i,17+1i)。
当i=1时,Va1、Vb1、Vc1、Vd1、Ve具体为:
Va1=(3+3i,3+3i,3+3i,3+3i);
Vb1=(1-1i,2-2i,3-3i,4-4i);
Vc1=(1+1i,2+2i,4+4i,8+8i);
Vd1=(2+2i,4+4i,6+6i,8+8i);
Ve=Va1.*Vb1+Vc1-Vd1
=(3+3i,3+3i,3+3i,3+3i).*(1-1i,2-2i,3-3i,4-4i)
+(1+1i,2+2i,4+4i,8+8i)-(2+2i,4+4i,6+6i,8+8i);
=(6+0i,12+0i,18+0i,24+0i)+(1+1i,2+2i,4+4i,8+8i)-(2+2i,4+4i,6+6i,8+8i);
=(5-1i,10-2i,16-2i,24+0i)。
当i=2时,Va1、Vb1、Vc1、Vd1、Ve具体为:
Va1=(6+6i,6+6i,6+6i,6+6i);
Vb1=(1-1i,2-2i,3-3i,4-4i);
Vc1=(1+1i,2+2i,4+4i,8+8i);
Vd1=(1+1i,3+3i,5+5i,7+7i);
Ve=Va1.*Vb1+Vc1-Vd1
=(6+6i,6+6i,6+6i,6+6i).*(1-1i,2-2i,3-3i,4-4i)+(1+1i,2+2i,4+4i,8+8i)-(1+1i,3+3i,5+5i,7+7i);
=(12+0i,24+0i,36+0i,48+0i)+(1+1i,2+2i,4+4i,8+8i)-(1+1i,3+3i,5+5i,7+7i);
=(12+0i,23-1i,35-1i,49+1i)。
当i=3时,Va1、Vb1、Vc1、Vd1、Ve具体为:
Va1=(8+8i,8+8i,8+8i,8+8i);
Vb1=(1-1i,2-2i,3-3i,4-4i);
Vc1=(1+1i,2+2i,4+4i,8+8i);
Vd1=(2+2i,4+4i,6+6i,8+8i);
Ve=Va1.*Vb1+Vc1-Vd1
=(8+8i,8+8i,8+8i,8+8i).*(1-1i,2-2i,3-3i,4-4i)+(1+1i,2+2i,4+4i,8+8i)-(2+2i,4+4i,6+6i,8+8i)
=(16+0i,32+0i,48+0i,64+0i)+(1+1i,2+2i,4+4i,8+8i)-(2+2i,4+4i,6+6i,8+8i)
=(15-1i,30-2i,46-2i,64+0i)。
S5、根据目标器件的类型,将第1级中间代码转换为目标器件所支持的第二源代码。
至此,矩阵运算的源到源转换过程结束。
其中,上述示例也可以通过如图9所示的流程图进行表征,本发明实施例在此不再赘述。
由上述示例可以看出,基于本发明实施例提供的矩阵运算的源到源转换方法,由于本发明实施例可以在根据为用户输入的矩阵运算的第一源代码获得对应的AST之后,根据该AST获取矩阵运算的运算类型,进而根据该运算类型对该矩阵运算添加对应的运算属性,获得第0级中间代码后,将该第0级中间代码进行降级库内联操作,得到第1级中间代码。由于第0级中间代码为对矩阵运算添加对应的运算属性后的中间代码,而矩阵的多个运算属性可以相互作用形成合并属性,表现为矩阵的多个步骤的运算可以化解为单个步骤的运算,因此在将该第0级中间代码进行降级库内联操作时,可以利用运算属性信息对内联降级过程进行优化,将相关联的计算放在同一个循环结构中,从而避免了现有技术中将AST转换为包含原子操作的中间代码(IR 0);在IR 0进行降级库内联操作,得到1级中间代码(IR 1)时由于每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-内存写入”的操作过程,因此矩阵复合运算最终转换的代码会出现大量循环结构和冗余的内存操作,导致总线流量和系统时延增大,使得系统性能无法满足要求的问题,消除了现有矩阵运算过程中冗余的循环结构和内存读写操作,提升了生成代码的性能。
如图10所示,本发明实施例提供了一种源到源编译器100,该源到源编译器100用于执行以上图5至图8所示的矩阵运算的源到源转换方法。该源到源编译器100可以包括相应步骤所对应的单元,示例的,可以包括:接收单元1001、解析单元1002、获取单元1003、添加单元1004、降级单元1005和转换单元1006。其中,
接收单元1001,用于接收第一源代码,第一源代码为用户输入的矩阵运算的源代码。
解析单元1002,用于解析第一源代码,获得对应的AST。
获取单元1003,用于根据AST,获取矩阵运算的运算类型。
添加单元1004,用于根据矩阵运算的运算类型,对矩阵运算添加对
应的运算属性,获得第0级中间代码。
降级单元1005,用于将第0级中间代码进行降级库内联操作,得到第1级中间代码。
转换单元1006,用于根据目标器件的类型,将第1级中间代码转换为目标器件所支持的第二源代码。
可选的,添加单元1004,还用于在降级单元1005将第0级中间代码进行降级库内联操作,得到第1级中间代码之后,将运算属性添加至降级库中,获得对应的降级表达式。
可选的,降级单元1005具体用于:
将第0级中间代码与降级库中的降级表达式进行匹配;
若匹配成功,根据降级库中与第0级中间代码匹配的降级表达式对第0级中间代码进行降级库内联操作,得到第1级中间代码。
可选的,添加单元1004具体用于:
根据矩阵运算的运算类型,对矩阵运算添加对应的运算属性;
根据运算属性对矩阵运算进行简化操作,获得简化后的矩阵运算;
将简化后的矩阵运算转换为第0级中间代码。
可以理解,本发明实施例的源到源编译器100可对应于上述图5至图8所示的矩阵运算的源到源转换方法中的源到源编译器,并且本发明实施例的源到源编译器100中的各个单元的划分和/或功能等均是为了实现上述图5至图8所示的矩阵运算的源到源转换方法流程,为了简洁,在此不再赘述。
基于本发明实施例提供的源到源编译器,由于本发明实施例提供的源到源编译器可以在根据为用户输入的矩阵运算的第一源代码获得对应的AST之后,根据该AST获取矩阵运算的运算类型,进而根据该运算类型对该矩阵运算添加对应的运算属性,获得第0级中间代码后,将该第0级中间代码进行降级库内联操作,得到第1级中间代码。由于第0级中间代码为对矩阵运算添加对应的运算属性后的中间代码,而矩阵的多个运算属性可以相互作用形成合并属性,表现为矩阵的多个步骤的运算可以化解为单个步骤的运算,因此在将该第0级中间代码进行降级库内联操
作时,可以利用运算属性信息对内联降级过程进行优化,将相关联的计算放在同一个循环结构中,从而避免了现有技术中将AST转换为包含原子操作的中间代码(IR 0);在IR 0进行降级库内联操作,得到1级中间代码(IR 1)时由于每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-内存写入”的操作过程,因此矩阵复合运算最终转换的代码会出现大量循环结构和冗余的内存操作,导致总线流量和系统时延增大,使得系统性能无法满足要求的问题,消除了现有矩阵运算过程中冗余的循环结构和内存读写操作,提升了生成代码的性能。
如图11所示,本发明实施例提供一种源到源编译器110,包括:处理器1101、存储器1102、总线1103和通信接口1104。
存储器1102用于存储计算机执行指令,处理器1101与存储器1102通过总线1103连接,当源到源编译器110运行时,处理器1101执行存储器1102存储的计算机执行指令,以使源到源编译器110执行如图5至图8所示的矩阵运算的源到源转换方法。具体的矩阵运算的源到源转换方法可参见上述如图5至图8所示的实施例中的相关描述,此处不再赘述。
其中,本发明实施例中的处理器1101可以是一个中央处理器(英文全称:central processing unit,英文缩写:CPU),还可以为其他通用处理器、数字信号处理器(英文全称:digital signal processing,英文缩写:DSP)、专用集成电路(英文全称:application specific integrated circuit,英文缩写:ASIC)、现场可编程门阵列(英文全称:field-programmable gate array,英文缩写:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该通用处理器也可以是任何常规的处理器等。另外,该处理器1101还可以为专用处理器,该专用处理器可以包括基带处理芯片、射频处理芯片等中的至少一个。进一步地,该专用处理器还可以包括具有源到源编译器110其他专用处理功能的芯片。
存储器1102可以包括易失性存储器(英文全称:volatile memory),例如随机存取存储器(英文全称:random-access memory,英
文缩写:RAM);存储器1102也可以包括非易失性存储器(英文全称:non-volatile memory),例如只读存储器(英文全称:read-only memory,英文缩写:ROM),快闪存储器(英文全称:flash memory),硬盘(英文全称:hard disk drive,英文缩写:HDD)或固态硬盘(英文全称:solid-state drive,英文缩写:SSD);另外,存储器1102还可以包括上述种类的存储器的组合。
总线1103可以包括数据总线、电源总线、控制总线和信号状态总线等。本发明实施例中为了清楚说明,在图11中将各种总线都示意为总线1103。
通信接口1104具体可以是源到源编译器110上的收发器。该收发器可以为无线收发器,也可以为有线收发器,本发明实施例对此不作具体限定。
在具体实现过程中,上述如图5至图8所示的方法流程中的各步骤均可以通过硬件形式的处理器1101执行存储器1102中存储的软件形式的计算机执行指令实现。为避免重复,此处不再赘述。
基于本发明实施例提供的源到源编译器,由于本发明实施例提供的源到源编译器可以在根据为用户输入的矩阵运算的第一源代码获得对应的AST之后,根据该AST获取矩阵运算的运算类型,进而根据该运算类型对该矩阵运算添加对应的运算属性,获得第0级中间代码后,将该第0级中间代码进行降级库内联操作,得到第1级中间代码。由于第0级中间代码为对矩阵运算添加对应的运算属性后的中间代码,而矩阵的多个运算属性可以相互作用形成合并属性,表现为矩阵的多个步骤的运算可以化解为单个步骤的运算,因此在将该第0级中间代码进行降级库内联操作时,可以利用运算属性信息对内联降级过程进行优化,将相关联的计算放在同一个循环结构中,从而避免了现有技术中将AST转换为包含原子操作的中间代码(IR 0);在IR 0进行降级库内联操作,得到1级中间代码(IR 1)时由于每一个原子操作都形成一个独立的循环结构,而每个循环结构都是一个独立的“内存读取-计算-内存写入”的操作过程,因此矩阵复合运算最终转换的代码会出现大量循环结构和冗余的内存操
作,导致总线流量和系统时延增大,使得系统性能无法满足要求的问题,消除了现有矩阵运算过程中冗余的循环结构和内存读写操作,提升了生成代码的性能。
可选的,本发明实施例提供一种可读介质,包括计算机执行指令。其中,当源到源编译器的处理器执行该计算机执行指令时,该源到源编译器执行以上图5至图8所示的矩阵运算的源到源转换方法。具体的矩阵运算的源到源转换方法可参见上述如图5至图8所示的实施例中的相关描述,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以
采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
Claims (10)
- 一种矩阵运算的源到源转换方法,其特征在于,所述方法包括:接收第一源代码,所述第一源代码为用户输入的矩阵运算的源代码;解析所述第一源代码,获得对应的抽象语法树AST;根据所述AST,获取所述矩阵运算的运算类型;根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性,获得第0级中间代码;将所述第0级中间代码进行降级库内联操作,得到第1级中间代码;根据目标器件的类型,将所述第1级中间代码转换为所述目标器件所支持的第二源代码。
- 根据权利要求1所述的方法,其特征在于,在所述将所述第0级中间代码进行降级库内联操作,得到第1级中间代码之后,还包括:将所述运算属性添加至所述降级库中,获得对应的降级表达式。
- 根据权利要求1所述的方法,其特征在于,所述将所述第0级中间代码进行降级库内联操作,得到第1级中间代码,包括:将所述第0级中间代码与所述降级库中的降级表达式进行匹配;若匹配成功,根据所述降级库中与所述第0级中间代码匹配的降级表达式对所述第0级中间代码进行降级库内联操作,得到第1级中间代码。
- 根据权利要求1)3任一项所述的方法,其特征在于,所述根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性,获得第0级中间代码,包括:根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性;根据所述运算属性对所述矩阵运算进行简化操作,获得简化后的矩阵运算;将所述简化后的矩阵运算转换为第0级中间代码。
- 一种源到源编译器,其特征在于,所述源到源编译器包括:接收单元,用于接收第一源代码,所述第一源代码为用户输入的矩阵运算的源代码;解析单元,用于解析所述第一源代码,获得对应的抽象语法树AST;获取单元,用于根据所述AST,获取所述矩阵运算的运算类型;添加单元,用于根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性,获得第0级中间代码;降级单元,用于将所述第0级中间代码进行降级库内联操作,得到第1级中间代码;转换单元,用于根据目标器件的类型,将所述第1级中间代码转换为所述目标器件所支持的第二源代码。
- 根据权利要求5所述的源到源编译器,其特征在于,所述添加单元,还用于在所述降级单元将所述第0级中间代码进行降级库内联操作,得到第1级中间代码之后,将所述运算属性添加至所述降级库中,获得对应的降级表达式。
- 根据权利要求5所述的源到源编译器,其特征在于,所述降级单元具体用于:将所述第0级中间代码与所述降级库中的降级表达式进行匹配;若匹配成功,根据所述降级库中与所述第0级中间代码匹配的降级表达式对所述第0级中间代码进行降级库内联操作,得到第1级中间代码。
- 根据权利要求5)7任一项所述的源到源编译器,其特征在于,所述添加单元具体用于:根据所述矩阵运算的运算类型,对所述矩阵运算添加对应的运算属性;根据所述运算属性对所述矩阵运算进行简化操作,获得简化后的矩阵运算;将所述简化后的矩阵运算转换为第0级中间代码。
- 一种源到源编译器,其特征在于,所述源到源编译器包括:处理器、存储器、总线和通信接口;所述存储器用于存储计算机执行指令,所述处理器与所述存储器通过所述总线连接,当所述源到源编译器运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述源到源编译器执行如权利要求1)4 任一项所述的矩阵运算的源到源转换方法。
- 一种可读介质,其特征在于,包括计算机执行指令;其中,当源到源编译器的处理器执行所述计算机执行指令时,所述源到源编译器执行如权利要求1)4任一项所述的矩阵运算的源到源转换方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/098781 WO2017107154A1 (zh) | 2015-12-24 | 2015-12-24 | 一种矩阵运算的源到源转换方法及源到源编译器 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/098781 WO2017107154A1 (zh) | 2015-12-24 | 2015-12-24 | 一种矩阵运算的源到源转换方法及源到源编译器 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017107154A1 true WO2017107154A1 (zh) | 2017-06-29 |
Family
ID=59088685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/098781 WO2017107154A1 (zh) | 2015-12-24 | 2015-12-24 | 一种矩阵运算的源到源转换方法及源到源编译器 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017107154A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158691A (zh) * | 2019-12-05 | 2020-05-15 | 杭州安恒信息技术股份有限公司 | 实现规则引擎动态化的方法 |
US11347501B1 (en) | 2020-12-15 | 2022-05-31 | International Business Machines Corporation | Shape-based code comparisons |
CN117421392A (zh) * | 2023-12-15 | 2024-01-19 | 武汉大学 | 一种基于词级对齐的代码搜索方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101369235A (zh) * | 2007-08-14 | 2009-02-18 | 冲电气工业株式会社 | 程序转换装置以及编译程序 |
CN101957772A (zh) * | 2009-07-16 | 2011-01-26 | 北京中电华大电子设计有限责任公司 | 一种具有轻量级比较指令生成的编译器 |
CN103631632A (zh) * | 2013-11-29 | 2014-03-12 | 华为技术有限公司 | 移植方法及源到源编译器 |
CN104572234A (zh) * | 2014-12-29 | 2015-04-29 | 杭州华为数字技术有限公司 | 生成用于并行计算架构的源代码的方法及源到源编译器 |
-
2015
- 2015-12-24 WO PCT/CN2015/098781 patent/WO2017107154A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101369235A (zh) * | 2007-08-14 | 2009-02-18 | 冲电气工业株式会社 | 程序转换装置以及编译程序 |
CN101957772A (zh) * | 2009-07-16 | 2011-01-26 | 北京中电华大电子设计有限责任公司 | 一种具有轻量级比较指令生成的编译器 |
CN103631632A (zh) * | 2013-11-29 | 2014-03-12 | 华为技术有限公司 | 移植方法及源到源编译器 |
CN104572234A (zh) * | 2014-12-29 | 2015-04-29 | 杭州华为数字技术有限公司 | 生成用于并行计算架构的源代码的方法及源到源编译器 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158691A (zh) * | 2019-12-05 | 2020-05-15 | 杭州安恒信息技术股份有限公司 | 实现规则引擎动态化的方法 |
CN111158691B (zh) * | 2019-12-05 | 2023-10-13 | 杭州安恒信息技术股份有限公司 | 实现规则引擎动态化的方法 |
US11347501B1 (en) | 2020-12-15 | 2022-05-31 | International Business Machines Corporation | Shape-based code comparisons |
CN117421392A (zh) * | 2023-12-15 | 2024-01-19 | 武汉大学 | 一种基于词级对齐的代码搜索方法及装置 |
CN117421392B (zh) * | 2023-12-15 | 2024-03-08 | 武汉大学 | 一种基于词级对齐的代码搜索方法及装置 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10824420B2 (en) | Caching build graphs | |
US11087220B2 (en) | Confidence weighting of complex relationships in unstructured data | |
JP5255000B2 (ja) | 要素型の型フローを可能にするためのクエリパターン | |
JP5763783B2 (ja) | 正規表現をコンパイルするための方法および装置 | |
US8997070B2 (en) | Extension mechanism for scripting language compiler | |
Grau et al. | OWL 2: The next step for OWL | |
JP5848778B2 (ja) | Fsmを実装するための専用要素の利用 | |
US10042740B2 (en) | Techniques to identify idiomatic code in a code base | |
US20160247087A1 (en) | Extracting complex entities and relationships from unstructured data | |
US20140282444A1 (en) | Programming language transformations with abstract syntax tree extensions | |
US20150006519A1 (en) | Automated problem inference from bug repositories | |
US10203943B2 (en) | Static analysis and reconstruction of deep link handling in compiled applications | |
US20150170054A1 (en) | Expanding an answer key to verify a question and answer system | |
WO2017107154A1 (zh) | 一种矩阵运算的源到源转换方法及源到源编译器 | |
WO2015031610A1 (en) | Method and apparatus for generating health quality metrics | |
US8677338B2 (en) | Data dependence testing for loop fusion with code replication, array contraction, and loop interchange | |
US10210156B2 (en) | Seed selection in corpora compaction for natural language processing | |
CN111767217A (zh) | Js单元测试案例生成方法及装置 | |
Vrgoč et al. | MillenniumDB: An open-source graph database system | |
Lahdhiri et al. | Framework for design exploration and performance analysis of RF-NoC manycore architecture | |
US8281263B2 (en) | Propagating design tolerances to shape tolerances for lithography | |
US9697021B2 (en) | Modifiable high-level intermediate representation of source code | |
Attarzadeh-Niaki et al. | An extensible modeling methodology for embedded and cyber-physical system design | |
Schaarschmidt et al. | Hunting energy bugs in embedded systems: A software-model-in-the-loop approach | |
Gebser et al. | Writing declarative specifications for clauses |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15911153 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15911153 Country of ref document: EP Kind code of ref document: A1 |