WO2021254123A1 - Address deduction method employing control flow graph, device, and readable storage medium - Google Patents

Address deduction method employing control flow graph, device, and readable storage medium Download PDF

Info

Publication number
WO2021254123A1
WO2021254123A1 PCT/CN2021/096379 CN2021096379W WO2021254123A1 WO 2021254123 A1 WO2021254123 A1 WO 2021254123A1 CN 2021096379 W CN2021096379 W CN 2021096379W WO 2021254123 A1 WO2021254123 A1 WO 2021254123A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
pointer
variable
memory
access
Prior art date
Application number
PCT/CN2021/096379
Other languages
French (fr)
Chinese (zh)
Inventor
石雯
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021254123A1 publication Critical patent/WO2021254123A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Definitions

  • the present disclosure generally relates to the field of computers. More specifically, the present disclosure relates to a method, an apparatus, and a readable storage medium for deriving an address based on a control flow graph.
  • the solutions of the present disclosure provide a method, a device and a readable storage medium for deriving a pointer address based on a control flow graph.
  • the present disclosure discloses a method for deriving an address based on a control flow graph.
  • the control flow graph includes a plurality of basic blocks, the plurality of basic blocks include at least one instruction, and a pointer contained in the instruction carries an address.
  • the method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.
  • the present disclosure discloses a computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system, and when the computer program code is run by a processor, the foregoing method is executed.
  • the present disclosure discloses a computing device including a processor core, and the processor core executes the aforementioned method.
  • the solution technology of the present disclosure is aimed at a dedicated processor such as artificial intelligence, simplifies its hardware complexity, facilitates programming, and further ensures the performance of the program with the help of pointer derivation.
  • FIG. 1 is a flowchart showing an embodiment of the present disclosure
  • FIG. 2 is a control flow diagram showing an example of an embodiment of the present disclosure
  • FIG. 3 is a control flow diagram showing another example of an embodiment of the present disclosure.
  • FIG. 4 is a flowchart showing another embodiment of the present disclosure.
  • FIG. 5 is a control flow diagram showing another example of an embodiment of the present disclosure.
  • FIG. 6 is a flowchart showing another embodiment of the present disclosure.
  • FIG. 7 is a flowchart showing another embodiment of the present disclosure.
  • FIG. 8 is a flowchart showing another embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram showing a computing device according to another embodiment of the present disclosure.
  • FIG. 10 is a structural diagram showing an integrated circuit device according to another embodiment of the present disclosure.
  • FIG. 11 is a structural diagram showing a board card according to another embodiment of the present disclosure.
  • Machine language is an operation code that the machine can directly recognize.
  • Each operation code has a corresponding circuit inside the computer to complete it.
  • a series of 0 and 1 instructions are used to directly control the potential of each component of the computer to complete the expectation.
  • Task. Programs written in machine language, because each instruction corresponds to a specific basic action of the computer, the program occupies less memory and has high execution efficiency.
  • the disadvantage is that the programming workload is large, error-prone, difficult to interpret, and depends on specific computer structures. Therefore, the program's versatility and portability are not good.
  • the high-level language has nothing to do with the hardware structure and instruction system of the computer. It is based on human logic and grammar, so it has stronger expressive ability, and it is also convenient to express data operations and program control structures, and can intuitively describe various algorithms. And it is easy to learn and master.
  • Currently popular programming languages such as java, c++, python, etc. are all high-level languages. Since high-level languages are programmed from a human perspective and are relatively indirect to computers, the opcodes generated after compilation are often longer than machine language program codes, and the execution speed is also slower. Not only that, high-level languages "can't see" the computer's hardware structure, and therefore cannot directly control the system software that accesses hardware resources. For this reason, some high-level languages will use assembly language as an external procedure or function of the high-level language.
  • Assembly language is between machine language and high-level language. Compared with machine language, it is easier for programmers to understand and program. Compared with high-level language, it has more direct machine relevance and achieves the characteristics of high speed and efficiency. In today's highly developed high-level language, assembly language is usually used at the bottom level for program optimization or hardware operation.
  • the code written in high-level language and assembly language needs to be converted into machine code by a compiler to drive the computer.
  • the artificial intelligence chip When the artificial intelligence chip is running, it needs to access a large amount of memory and move data between the memories to perform calculation tasks. For example, after the image or voice information is converted into a matrix, the data of the matrix will be copied from the off-chip memory to the on-chip memory for calculation.
  • ASICs application-specific integrated circuits
  • This disclosure is aimed at application-specific integrated circuits (ASICs), especially artificial intelligence chips.
  • ASICs application-specific integrated circuits
  • the memory address is derived at compile time and the general address is realized. To achieve the purpose of streamlining computing resources and shortening computing time.
  • An embodiment of the present disclosure is a method for deriving an address based on a control flow graph, and in more detail, a method for deriving an address based on a control flow graph is implemented by a fixed point algorithm.
  • the Control Flow Graph (CFG) is an abstract data structure used in the compiler, which represents all the paths that a program may execute, and reflects the possible flow of all basic blocks in the process in the form of a flowchart.
  • the control flow graph is composed of nodes and the relationships between nodes.
  • a node is also called a basic block (basic block, BB), which is a sequence of statements executed in the maximum order in the program.
  • BB basic block
  • Each basic block has only one entry and exit. During execution, it enters from its entry and exits from its exit.
  • the characteristic of the basic block is that as long as the first instruction is executed, all the instructions in the basic block will be executed in order.
  • Each basic block contains at least one instruction, and the instruction in the basic block may use a pointer to point to a specific scratchpad or memory.
  • a pointer is a variable used to store an address in a specific address space. Through the pointer, the programmer can load data into the space of the specific address pointed to by the pointer, or fetch the data in the specific address pointed to by the pointer.
  • Control flow graphs often have conditions such as predicates, jumps, and loops.
  • a predicate refers to a delegate, which includes a method function used to determine whether the condition is met.
  • Jump refers to a judgment instruction that causes the process to branch. When a condition is met, one instruction is executed, and another instruction is executed when the condition is not met. The loop is to continuously execute the same instruction under certain conditions, and will not stop until the condition is met.
  • the appearance of the control flow graph includes but is not limited to the aforementioned conditional instructions. Unless the specific situation and conditions are compared, the memory access will be uncertain. When it is not certain which memory the pointer corresponds to, the pointer is generally set to be accessed by a general address. By deriving the pointer to the address space, the compiler of this embodiment clarifies the address space accessed by the pointer originally set as a general address. Once it is determined at compile time, unnecessary execution steps can be reduced to optimize the program. Performance.
  • Fig. 1 is a flowchart showing this embodiment.
  • step 101 the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer.
  • This embodiment traverses all the basic blocks in the control flow diagram, and obtains the address space pointed to by the pointer used by the instruction in each basic block. For the same pointer, get all its possible address spaces.
  • the order of the data stream can be reversed. Take the reverse order as an example, which is obtained by traversing in a later order, and then inverting the result. The reverse order can converge earlier. This embodiment does not limit the order of traversal, but it is better to use reverse post-order traversal.
  • step 102 the compiler determines whether all these possible address spaces are deduced to a single address space.
  • the compiler obtains the pointer variable of the pointer in the basic block. This step is to determine whether these pointers can be deduced to a single address space after undergoing the flow of the control flow graph.
  • the compiler can sequentially obtain all possible address spaces of pointers used by instructions in all predecessor basic blocks of each basic block according to the control flow graph, and determine all possible addresses of pointers used by instructions in these predecessor basic blocks Whether the space can be deduced to a single address space.
  • the basic block when a basic block is executed before another basic block, the basic block can be regarded as a predecessor basic block of another basic block.
  • the third basic block and the fourth basic block in FIG. 2 are both predecessor basic blocks of the fifth basic block.
  • the compiler can perform alias analysis to find out The address variable of the alias is used to determine whether the address of the variable pointed to by the pointer is unique. If the compiler does not have the function of alias analysis, this embodiment can simulate the process of alias analysis, and judge the variable information pointed to by the pointer recorded on the Intermediate Representation (IR) of each instruction.
  • IR Intermediate Representation
  • step 103 is executed, and the compiler sets this pointer to access the address space.
  • step 104 is executed, and the compiler sets the pointer to general address access. Not deduced to a single address space means that this pointer may have multiple possibilities in the control flow graph, and it will access different address spaces under different conditions, so it remains as a general address access.
  • the compiler can generate a memory access instruction according to the address space settings that the pointer ultimately points to, so as to implement various data access operations such as images, voices, or texts according to the memory access instructions.
  • FIG. 2 is an example of a control flow diagram to illustrate the flow chart of this embodiment.
  • This control flow diagram 200 includes five basic blocks, namely a first basic block 201, a second basic block 202, a third basic block 203, a fourth basic block 204, and a fifth basic block 205.
  • the first basic block 201 is the entrance of the control flow diagram 200
  • the exit of the first basic block 201 is connected to the entrance of the second basic block 202
  • the exit of the second basic block 202 is connected to the third basic block 203 and the fourth basic block at the same time.
  • the entry of block 204 which means that there is a judgment formula at the exit of the second basic block 202, for example, jump, jump to the third basic block 203 if a specific condition is met, and jump to the fourth basic block 204 if the specific condition is not met,
  • the exit of the third basic block 203 and the exit of the fourth basic block 204 are both connected to the entrance of the fifth basic block 205.
  • the exit of the fifth basic block 205 is the exit of the entire control flow diagram 200, and is connected back to the second basic block. Entrance to block 202.
  • step 101 the compiler traverses all the basic blocks of the control flow graph 200 to obtain all possible address spaces of the pointers in each basic block.
  • the pointer p After traversing all the basic blocks in the control flow graph 200, for the pointer p, all possible address spaces to obtain it are as follows: In the first basic block 201, the pointer p does not point to it explicitly, so it is a general address access; In the basic block 202 and the third basic block 203, a is assigned to the pointer p; in the fourth basic block 204, b is assigned to the pointer p; in the fifth basic block 205, the variable c is pointed to by the self-pointer p The value retrieved from the address.
  • the pointer p may be variable a or b, which directly affects the value of variable c.
  • step 102 the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph. In one case, if the address spaces corresponding to the variable a and the variable b are the same, for example, both are in the on-chip memory, step 103 is executed, and the compiler sets the pointer p to access a specific address space, that is, the on-chip memory. At the exit of the fifth basic block 205, the pointer p points to the on-chip memory. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202. At this time, the pointer p is updated to access the on-chip memory, and then continues according to The control flow graph 200 executes the process of FIG. 1 to iterate until the address space pointed to by the pointer p no longer changes.
  • the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement access to various data such as images, voice, or text according to the memory access instruction operate.
  • step 104 is executed.
  • the compiler sets the pointer p for general address access.
  • the pointer p is updated to a universal address access. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202 , Continue to iterate until the address space accessed by the pointer p no longer changes.
  • FIG. 3 is another example of a control flow diagram, and is also used to illustrate a flowchart of this embodiment.
  • the control flow diagram 300 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow diagram 200, and the difference lies in the instructions related to the pointer p in each basic block.
  • step 101 the compiler traverses all the basic blocks of the control flow graph 300, and obtains all possible address spaces of the pointers in each basic block.
  • the pointer p After traversing all the basic blocks in the control flow graph 300, all possible points to it are obtained as follows: in the first basic block 301, the variable a is assigned to the pointer p; in the second basic block 302, the variable c It is the value fetched from the address pointed to by the pointer p, and the address is the value of the variable a; in the fourth basic block 304, the pointer p points to an additional offset b.
  • step 102 the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph.
  • the pointer p points to the address space where the variable a is stored, for example, they are all in the on-chip memory.
  • the specific address has an offset b in the fourth basic block 304, it is still located in the same address space, so All possible addresses of the pointer in the predecessor basic block of the fifth basic block are deduced to a single address space, that is, the on-chip memory, then step 103 is executed, and the compiler sets the pointer p to access the on-chip memory.
  • the pointer p is updated to point to the on-chip memory. This setting follows the loop of the control flow graph 300 and returns to the entry of the second basic block 302 to iterate until the address space accessed by the pointer p is no longer Until the change occurs. In this example, the pointer p will no longer change, and the on-chip memory can be accessed.
  • the conditional instructions of the control flow graph also include function calls.
  • Function call is to call a subroutine. When a function call is encountered, it will jump to the subroutine for execution. After the subroutine is executed, it will return to the main program to execute the next instruction.
  • Another embodiment of the present disclosure is an address space derivation method suitable for function calls, and the flowchart is shown in FIG. 4.
  • step 401 the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer.
  • This embodiment traverses all the basic blocks in the control flow graph, and obtains all its possible address spaces for the same pointer.
  • step 402 the compiler determines whether the pointer involves a function call. If a function call is involved, step 403 is executed, and the compiler judges whether the function is read-only. If it is not a read-only function, the result of the function call cannot be confirmed during the compiling stage, so step 404 is executed, and the compiler sets the pointer to general address access.
  • step 403 if the function is judged to be a read-only function, since the read-only function does not change the address space accessed, or in step 402, it is judged that no function call is involved, then step 405 is executed, and the compiler judges all these possible Whether the address space of is deduced to a single address space. If it is not deduced to a single address space, step 404 is executed, and the compiler sets the pointer to general address access. If it is deduced to a single address space, step 406 is executed, and the compiler sets this pointer to access the address space.
  • step 402 when the compiler determines that the pointer involves a function call, step 403 may not be executed, and step 404 may be executed directly, and the compiler sets the pointer to general address access.
  • the compiler can repeat the above steps 401 to 406 until the address space pointed to by the pointer remains unchanged, and then the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement image, voice, or video access according to the memory access instruction. Access operations of various data such as text.
  • FIG. 5 is another example of a control flow diagram to illustrate the flow chart of this embodiment.
  • This control flow graph 500 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow graph 300, with the only difference that the fifth basic block 305 becomes a function call 505.
  • step 401 the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer.
  • the variable a is assigned to the pointer p. Assuming that the variable a is stored in the memory of a specific chip, the pointer p should access the memory of the specific chip.
  • the control flow through the second basic block 502, the third basic block 503, and the fourth basic block 504 does not change the address space accessed by the pointer p. Therefore, all possible address spaces of the pointer p are the memory of the specific chip.
  • step 402 the compiler determines whether the pointer involves a function call. Since this control flow graph 500 involves a function call 505, step 403 is executed, and the compiler judges whether the function is read-only. Assuming that the function call 505 is not a read-only function, step 404 is executed, and the compiler sets the pointer p as a general address access. In other words, at the exit of the function call 505, the pointer p is updated to the general address access. This setting follows the loop of the control flow graph 500 and returns to the entry of the second basic block 502 to iterate until the address space accessed by the pointer p is no longer Changes. In this example, the pointer p will be set to general address access at compile time.
  • the address space derivation is used to clarify the pointer that can be deduced to access a single address space, which reduces the program running time and optimizes the performance of the program.
  • the present disclosure also provides a method that does not need to set additional fields or parameters. It only needs to explicitly declare the address space where the variable is located when the variable is defined. The access operation is determined by the general address. Access mechanism to complete the method. In more detail, because artificial intelligence chips do not have a memory management mechanism, they cannot directly obtain specific memory information when encountering general address access.
  • Another embodiment of the present disclosure is a general address access method for declaring the address space where the variable is located, which adopts a way of simulating hardware by software to simplify the complexity of hardware design.
  • the application scenario of this embodiment is an artificial intelligence chip that includes a first memory and a second memory.
  • the storage space of the first memory is defined by the first address to the second address
  • the storage space of the second memory is defined by the first address to the second address.
  • the first memory can be an off-chip memory
  • the second memory can be an on-chip memory.
  • the addresses of off-chip memory and on-chip memory are arranged consecutively.
  • the first memory has a total of 128 storage spaces, which are respectively pointed to by 128 addresses between the first address addr0 and the second address addr127.
  • the second memory also has 128 storage spaces, followed by 128 addresses between the third address addr128 and the fourth address addr255, that is to say, although the first memory and the second memory are not in the same place, the third address addr128 is the second address addr127 plus one.
  • the pointer in the basic block has been set as a general address access through the process shown in FIG. 1 or FIG. 4. The flowchart of this embodiment is shown in FIG. 6.
  • step 601 the compiler determines whether the universal address falls between the first address and the second address. Since the addresses of the first memory and the second memory are arranged consecutively, as long as it is judged whether the general address is smaller than the third address, it can be known whether the general address falls between the first address and the second address. If it falls between the first address and the second address, step 602 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 603 is executed, and the compiler sets the first variable to false.
  • step 604 the compiler determines whether the first variable is true. If the first variable is true, it means that the address of the pointer is in the first memory, so step 605 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, it means that the address of the pointer is in the second memory, so step 606 is executed, and the compiler sets the pointer to access the second memory.
  • the compiler can generate a memory access instruction according to the setting, so as to implement various data access operations such as image, voice, or text according to the memory access instruction.
  • variable value of the pointer is calculated with the address range of the memory to obtain information about which memory the address is located in, and then the address space that the pointer should access is determined in the compilation stage.
  • setp.lt is the judgment instruction for the less than operation
  • s is the predicate register (that is, the aforementioned first variable)
  • %addr is the value of the pointer p, which is the address
  • 0xXXXXX represents the value 128,
  • @s is the judgment formula
  • judgment s Is it true? @s is also a judgment formula to determine whether s is false
  • ld.offchip is loaded from off-chip memory
  • ld.onchip is loaded from on-chip memory.
  • the command (1) represents: to determine whether the value of the pointer p is less than 128, if yes, set s to 1, which is true, if not, set s to 0, which is false.
  • the instruction (2) means: if s is true, load data from off-chip memory.
  • the instruction (3) represents: if s is false, load data from the on-chip memory. Regardless of loading data from off-chip memory or on-chip memory, its address is the value %addr of the pointer p.
  • This embodiment does not need to add additional fields or parameters. Only by judging whether the value in the predicate register is true or false, the information on which memory the address pointed to by the pointer p is located can be obtained at the compilation stage, and the general address information can be established Access mechanism.
  • the method of the present invention can also be used to find the corresponding address space.
  • Another embodiment of the present invention is a method for specifying general address access in three memories.
  • multiple basic blocks in a control flow graph can access the first memory, the second memory, and the third memory.
  • the storage space of the first memory is defined by the first address to the second address
  • the storage space of the second memory is defined by the third address to the fourth address
  • the storage space of the third memory is defined by the fifth address to the first address.
  • the first memory and the second memory can be different off-chip memories
  • the third memory is on-chip memory.
  • the addresses of these memories are arranged sequentially.
  • the first memory has 128 storage spaces, which are represented by 128 addresses between the first address addr0 and the second address addr127, and the second memory also has 128 storage spaces.
  • the connection is represented by 128 addresses between the third address addr128 and the fourth address addr255
  • the third memory also has 128 storage spaces
  • the connection is represented by 128 addresses between the fifth address addr256 and the sixth address addr383
  • the third address addr128 is the second address addr127 plus one
  • the fifth address addr256 is the fourth address addr255 plus one.
  • step 701 the compiler determines whether the universal address falls between the first address and the second address. That is, it is judged whether the general address is smaller than the third address. If it falls between the first address and the second address, step 702 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 703 is executed, and the compiler sets the first variable to false.
  • step 704 is executed, and the compiler determines whether the universal address falls between the fifth address and the sixth address. That is, it is judged whether the general address is greater than the fourth address. If it falls between the fifth address and the sixth address, step 705 is executed, and the compiler sets the second variable as true. If it does not fall between the fifth address and the sixth address, step 706 is executed, and the compiler sets the second variable to false.
  • step 707 the compiler determines whether the first variable is true. If the first variable is true, it indicates that the address of the pointer is located in the first memory, so step 708 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, step 709 is executed, and the compiler judges whether the second variable is true. If the second variable is true, it means that the address of the pointer is located in the third memory, so step 710 is executed, and the compiler sets the pointer to access the third memory.
  • step 711 the compiler judges whether the first variable and the second variable are both false, if yes, it means that the address is not in the first memory and the third memory, then step 712 is executed, the compiler The pointer is set to access the second memory.
  • step 707, step 709, and step 711 since the address must be in the first memory, the second memory, or the third memory, at least one of the judgment steps of step 707, step 709, and step 711 will be judged as yes. In other words, it should not be judged as yes in step 711. There is a situation where the first variable or the second variable is not false. If it does occur, the process will return to step 707, and the compiler will re-judge whether the first variable and the second variable are true or false.
  • the compiler can generate a memory access instruction according to the setting, so as to implement various data access operations such as image, voice, or text according to the memory access instruction.
  • variable value of the pointer is calculated with the address range, and then the memory where the address is located can be known, and then the value can be accessed according to the address.
  • the instruction (4) represents: judge whether the value %addr of the pointer p is less than 128, if yes, set the value of the s predicate register (ie, the first variable) to 1, if not, set it to 0.
  • the command (5) represents: judge whether the value %addr of the pointer p is less than 256 (0xYYYY), if yes, set the value of the t predicate register (that is, the second variable) to 1, if not, set it to 0 .
  • the instruction (8) involves the operation of the predicate, but not all compilers can support the operation of the predicate. If the compiler cannot support it, the predicate assignment can be used to achieve the equivalent effect.
  • Another embodiment of the present invention is to use predicate assignment to realize a method for judging the specific address space of the general address in three memories.
  • FIG. 8 is a flowchart showing this embodiment, in which steps 801 to 810 correspond to steps 701 to 710 in FIG. 7 respectively, and will not be repeated.
  • step 811 When it is determined in step 807 that the first variable is true, step 811 is executed after step 808, and the compiler sets the third variable to false. Similarly, when the second variable is judged to be true in step 809, step 811 is executed after step 810, and the compiler sets the third variable to false. When it is determined in step 809 that the second variable is not true, step 812 is executed, and the compiler sets the third variable to be true. After step 811 and step 812, step 813 is executed, and the compiler determines whether the third variable is true. If it is true, it means that both the first and second variables are false, and step 814 is executed, and the compiler sets the pointer to access the second memory. If it is false, it means that the process has passed step 808 or step 810, and the pointer has been set to access the first memory or the second memory, so the process ends in step 815.
  • the image or voice data is calculated according to the settings.
  • the command (9) represents: set the third variable u to be true; the command (10) represents: if s is not false, then the third variable is set to false; the command (11) represents: if t If it is not false, set the third variable to false; the instruction (12) means: if the third variable is true, load the second memory.
  • FIG. 9 shows a schematic diagram of the internal structure of such a computing device 900.
  • the computing device 900 has a total of sixteen processor cores (processor core 0 to processor core 15) for performing matrix calculation tasks, and every four processor cores form a processing unit group, that is, a cluster.
  • processor core 0 to processor core 3 form a first cluster 902
  • processor core 4 to processor core 7 form a second cluster 904
  • processor core 8 to processor core 11 form a third cluster 906.
  • the processor core 12 to the processor core 15 form a fourth cluster 908.
  • the computing device 130 basically uses a cluster as a unit to perform computing tasks.
  • the computing device 900 also includes a storage unit core 910 and a shared storage unit 912.
  • the storage unit core 910 is mainly used to control data exchange and serves as a communication channel between the computing device 900 and the off-chip memory.
  • the shared storage unit 912 is an on-chip memory for temporarily storing the calculated intermediate values of the clusters 902, 904, 906, and 908.
  • the processor core 0 to the processor core 15 are used to execute the methods of the foregoing embodiments, specifically including but not limited to the processes of FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8.
  • FIG. 10 is a structural diagram showing an integrated circuit device 1000 according to an embodiment of the present disclosure. As shown in FIG. 10, the integrated circuit device 1000 includes a computing device 900, a universal interconnect interface 1004, and other processing devices 1006.
  • the universal interconnect interface 1004 can be used to transmit data and control commands between the computing device 900 and other processing devices 1006.
  • the computing device 900 may obtain required input data from other processing devices 1006 via the universal interconnect interface 1004, and write the required input data to the shared storage unit 912 on the computing device 900 chip.
  • the computing device 900 can obtain control instructions from other processing devices 1006 via the universal interconnect interface 1004, and write them into the on-chip control buffer of the computing device 900.
  • the other processing device 1006 may be one or more types of processors in general and/or special-purpose processors such as central processing unit, graphics processor, artificial intelligence processor, etc., the number of which is not limited but determined according to actual needs. .
  • the other processing device 1006 serves as an interface between the computing device 900 and external data and control, performs basic control including but not limited to data transfer, and completes the starting and stopping of the computing device 900.
  • the other processing device 1006 can also cooperate with the computing device 900 to complete computing tasks.
  • the integrated circuit device 1000 further includes an off-chip memory 1008, which can be connected to the computing device 900 and other processing devices 1006, respectively.
  • the off-chip memory 1008 is used to store data of the computing device 900 and other processing devices 1006, and is especially suitable for data that cannot be fully stored in the internal storage of the computing device 900 or other processing devices 1006 for the data that needs to be calculated.
  • the integrated circuit device 1000 can be used as a system on chip (SOC) for mobile phones, robots, drones, video capture and other equipment, thereby effectively reducing the core area of the control part, increasing processing speed and reducing overall power consumption .
  • the universal interconnect interface 1004 of the integrated circuit device 1000 is connected to certain components of the device. Some components here can be, for example, a camera, a monitor, a mouse, a keyboard, a network card or a wifi interface.
  • the present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1000.
  • the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
  • the board 1100 may also include other supporting components.
  • the supporting components include a storage device 1104, an interface device 1106, and a control device 1108.
  • the storage device 1104 is connected to the chip 1102 in the chip packaging structure through a bus 1114 for storing data.
  • the storage device 1104 may include multiple groups of storage units 1110. Each group of storage units 1110 may be the aforementioned off-chip memory.
  • the interface device 1106 is electrically connected to the chip 1102 in the chip packaging structure.
  • the interface device 1106 is used to implement data transmission between the chip 1102 and an external device 1112 (for example, a server or a computer).
  • the interface device 1106 is a standard PCIe interface, and the data to be processed is transferred from the server to the chip 1102 through the standard PCIe interface to realize data transfer.
  • the calculation result of the chip 1102 is also transmitted back to the external device 1112 by the interface device 1106.
  • the control device 1108 is electrically connected to the chip 1102 to monitor the state of the chip 1102. Specifically, the chip 1102 and the control device 1108 may be electrically connected through an SPI interface.
  • the control device 1108 may include a single-chip microcomputer ("MCU", Micro Controller Unit).
  • electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwaves, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • Another embodiment of the present disclosure is a computer-readable storage medium on which is stored a computer program code that uses a universal address for access, and when the computer program code is run by a processor, the method described in each of the foregoing embodiments is executed .
  • the present invention embodies the address space of the general address in the compilation stage, determines the access to the memory, simplifies the hardware complexity, facilitates programming, and ensures the performance of the program.
  • the artificial intelligence chip performs matrix calculations based on the compiled opcodes to complete the calculation tasks of input data (such as image or voice data). Due to the preprocessing of the general address in the present invention, the calculation process will be more streamlined and efficient.
  • the method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.
  • the method further includes: if not, setting the pointer to universal address access.
  • Clause A3 The method according to Clause A2, further comprising: judging whether the pointer involves a function call; and if it involves a function call, setting the pointer to a general address access.
  • the plurality of basic blocks access the first memory and the second memory
  • the storage space of the first memory is defined by the first address to the second address
  • the storage space of the second memory is defined by the third address to the fourth address
  • the method further includes: determining whether the universal address falls between the first address and the second address; Between the first address and the second address, set the first variable to be true; if it does not fall between the first address and the second address, set the first variable to be false; judge all Whether the first variable is true; and if the first variable is true, set the pointer to access the first memory.
  • Clause A6 The method according to clause A5, wherein if the first variable is false, the pointer is set to access the second memory.
  • Clause A7 The method according to clause A5, wherein the step of judging the universal address comprises: judging whether the universal address is smaller than the third address.
  • Clause A8 The method according to clause A4, wherein the plurality of basic blocks also access a third memory, and the storage space of the third memory is defined by the fifth address to the sixth address, and the method further includes: Determine whether the universal address falls between the fifth address and the sixth address; if the universal address falls between the fifth address and the sixth address, set the second variable to be true; Determine whether the second variable is true; and if the second variable is true, set the pointer to access the third memory.
  • Clause A9 The method of clause A8, wherein the third address is the second address plus one, and the fifth address is the fourth address plus one.
  • Clause A10 The method according to clause A9, wherein the step of judging whether the universal address falls between the fifth address and the sixth address comprises: judging whether the universal address is greater than the fourth address .
  • Clause A11 The method according to clause A8, further comprising: if the universal address does not fall between the fifth address and the sixth address, then setting the second variable to false; If both the first variable and the second variable are false, then the pointer is set to access the second memory.
  • Clause A12. The method according to clause A8, further comprising: setting a third variable as true; if it is judged that the first variable is true, then setting the third variable as false; if it is judged that the second variable is true , Then set the third variable to be false; determine whether the third variable is true; and if the third variable is true, set the pointer to access the second memory.
  • Clause A17 A computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system.
  • the computer program code When the computer program code is run by a processor, it executes any of the items in Clause A1-16. The method described.
  • Clause A18 A computing device including a processor core that executes the method described in any one of clauses A1-16.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Provided are an address deduction method employing a control flow graph, a device, and a readable storage medium. A computing device is included in an integrated circuit device. The integrated circuit device comprises a universal interconnect interface and other processing devices. The computing device interacts with said other processing devices to jointly complete a computing operation specified by a user. The integrated circuit device may further comprise a storage device. The storage device is respectively connected to the computing device and said other processing devices for storing data of the computing device and said other processing devices.

Description

基于控制流图推导地址的方法、装置及可读存储介质Method, device and readable storage medium for deriving address based on control flow graph
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年6月16日申请的,申请号为2020105506994,名称为“基于控制流图推导地址的方法、装置及可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。This application claims the priority of a Chinese patent application filed on June 16, 2020, with application number 2020105506994, titled "Method, device and readable storage medium for deriving an address based on a control flow graph", the full text of which is hereby Introduced as a reference.
技术领域Technical field
本公开一般地涉及计算机领域。更具体地,本公开涉及基于控制流图推导地址的方法、装置及可读存储介质。The present disclosure generally relates to the field of computers. More specifically, the present disclosure relates to a method, an apparatus, and a readable storage medium for deriving an address based on a control flow graph.
背景技术Background technique
传统通用处理器具有自动管理存储的机制,对于访问数据,只需要简单利用取出(load)或保存(store)指令,便可将数据从寄存器加载到片内寄存器中进行处理,处理完成后的结果再经过寄存器存回到内存中,以达到加快对片外内存进行数据处理的执行速度。Traditional general-purpose processors have an automatic storage management mechanism. For accessing data, simply use load or store instructions to load the data from the register to the on-chip register for processing, and the result after the processing is completed It is then stored back into the memory through the register to speed up the execution speed of data processing on the off-chip memory.
但对于人工智能等专用芯片而言,由于顾虑到性能、面积、功耗等因素,并不采用自动存储管理的机制,而是以指令显式地管理片上空间,在访问数据时,往往需要额外域或参数设置访存指令所操作的地址空间,从而增加了编程难度,因此,一种更有效的地址推导方案是迫切需要的。However, for special chips such as artificial intelligence, due to factors such as performance, area, power consumption, etc., automatic storage management mechanisms are not used, but the on-chip space is explicitly managed by instructions. When accessing data, additional The domain or parameter sets the address space operated by the memory access instruction, thereby increasing the programming difficulty. Therefore, a more effective address derivation scheme is urgently needed.
发明内容Summary of the invention
为了至少部分地解决背景技术中提到的技术问题,本公开的方案提供了一种基于控制流图推导指针地址的方法、装置及可读存储介质。In order to at least partially solve the technical problems mentioned in the background art, the solutions of the present disclosure provide a method, a device and a readable storage medium for deriving a pointer address based on a control flow graph.
在一个方面中,本公开揭露一种基于控制流图推导地址的方法,所述控制流图包括多个基本块,所述多个基本块包括至少一条指令,所述指令包含的指针载有地址。所述方法包括:遍历所述多个基本块,获得所述指针的所有可能的地址空间;判断所述所有可能的地址空间是否推演至单一地址空间;如是,设定所述指针访问所述地址空间。In one aspect, the present disclosure discloses a method for deriving an address based on a control flow graph. The control flow graph includes a plurality of basic blocks, the plurality of basic blocks include at least one instruction, and a pointer contained in the instruction carries an address. . The method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.
在另一个方面,本公开揭露一种计算机可读存储介质,其上存储有在系统中利用通用地址进行访问的计算机程序代码,当所述计算机程序代码由处理器运行时,执行前述的方法。In another aspect, the present disclosure discloses a computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system, and when the computer program code is run by a processor, the foregoing method is executed.
在另一个方面,本公开揭露一种计算装置,包括处理器核,所述处理器核执行前述的方法。In another aspect, the present disclosure discloses a computing device including a processor core, and the processor core executes the aforementioned method.
本公开的方案技术针对人工智能等专用处理器,简化其硬件复杂度,方便编程,并借助指针推导进一步保证了程序的性能。The solution technology of the present disclosure is aimed at a dedicated processor such as artificial intelligence, simplifies its hardware complexity, facilitates programming, and further ensures the performance of the program with the help of pointer derivation.
附图说明Description of the drawings
通过参考附图阅读下文的详细描述,本公开示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,并且相同或对应的标号表示相同或对应的部分其中:By reading the following detailed description with reference to the accompanying drawings, the above and other objects, features, and advantages of the exemplary embodiments of the present disclosure will become easier to understand. In the accompanying drawings, several embodiments of the present disclosure are shown in an exemplary but not restrictive manner, and the same or corresponding reference numerals indicate the same or corresponding parts of which:
图1是示出本公开实施例的流程图;FIG. 1 is a flowchart showing an embodiment of the present disclosure;
图2是示出本公开实施例的一种范例的控制流图;FIG. 2 is a control flow diagram showing an example of an embodiment of the present disclosure;
图3是示出本公开实施例的另一种范例的控制流图;FIG. 3 is a control flow diagram showing another example of an embodiment of the present disclosure;
图4是示出本公开另一实施例的流程图;FIG. 4 is a flowchart showing another embodiment of the present disclosure;
图5是示出本公开实施例的另一种范例的控制流图;FIG. 5 is a control flow diagram showing another example of an embodiment of the present disclosure;
图6是示出本公开另一实施例的流程图;FIG. 6 is a flowchart showing another embodiment of the present disclosure;
图7是示出本公开另一实施例的流程图;FIG. 7 is a flowchart showing another embodiment of the present disclosure;
图8是示出本公开另一实施例的流程图;FIG. 8 is a flowchart showing another embodiment of the present disclosure;
图9是示出本公开另一实施例的计算装置的示意图;FIG. 9 is a schematic diagram showing a computing device according to another embodiment of the present disclosure;
图10是示出本公开另一实施例的集成电路装置的结构图;以及FIG. 10 is a structural diagram showing an integrated circuit device according to another embodiment of the present disclosure; and
图11是示出本公开另一实施例的板卡的结构图。FIG. 11 is a structural diagram showing a board card according to another embodiment of the present disclosure.
具体实施方式detailed description
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should also be further understood that the term "and/or" used in the specification and claims of the present disclosure refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context.
下面结合附图来详细描述本公开的具体实施方式。The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
为了使计算机能够实现特定操作,程序员必须将需解决的问题的思路、方法和手段通过计算机能够理解的形式输入至计算机,使得计算机能够根据指令依序执行。这种人和计算机之间沟通的渠道称为编程。编程语言分为三大类:机器语言、高级语言及汇编语言。In order to enable the computer to implement a specific operation, the programmer must input the ideas, methods, and means of the problem to be solved into the computer in a form that the computer can understand, so that the computer can execute the instructions in sequence. This channel of communication between people and computers is called programming. Programming languages are divided into three categories: machine language, high-level language and assembly language.
机器语言是机器能直接识别的操作码,每一操作码在计算机内部都有相应的电路来完成它,一般是通过一系列的0和1的指令去直接控制计算机各组件的电位,来完成预期的任务。使用机器语言编写的程序,由于每条指令都对应计算机一个特定的基本动作,所以程序占用内存少、执行效率高,但缺点是编程工作量大、易出错、难以解读、依赖具体的计算机结构,因而程序的通用性、移植性不佳。Machine language is an operation code that the machine can directly recognize. Each operation code has a corresponding circuit inside the computer to complete it. Generally, a series of 0 and 1 instructions are used to directly control the potential of each component of the computer to complete the expectation. Task. Programs written in machine language, because each instruction corresponds to a specific basic action of the computer, the program occupies less memory and has high execution efficiency. However, the disadvantage is that the programming workload is large, error-prone, difficult to interpret, and depends on specific computer structures. Therefore, the program's versatility and portability are not good.
高级语言与计算机的硬件结构及指令系统无关,是以人类的逻辑及语法为基础, 因此具有更强的表达能力,也方便表示数据的运算和程序的控制结构,能直观的描述各种算法,而且容易学习掌握,目前流行的编程语言像是java、c++、python等都是高级语言。由于高级语言是以人类的角度出发编程,对计算机来说相对不直接,所以编译后生成的操作码往往比机器语言的程序代码要长,执行的速度也慢。不仅如此,高级语言“看不见”计算机的硬件结构,因此不能直接控制访问硬件资源的系统软件。为此,一些高级语言会利用汇编语言来作为高级语言的一个外部过程或函数。The high-level language has nothing to do with the hardware structure and instruction system of the computer. It is based on human logic and grammar, so it has stronger expressive ability, and it is also convenient to express data operations and program control structures, and can intuitively describe various algorithms. And it is easy to learn and master. Currently popular programming languages such as java, c++, python, etc. are all high-level languages. Since high-level languages are programmed from a human perspective and are relatively indirect to computers, the opcodes generated after compilation are often longer than machine language program codes, and the execution speed is also slower. Not only that, high-level languages "can't see" the computer's hardware structure, and therefore cannot directly control the system software that accesses hardware resources. For this reason, some high-level languages will use assembly language as an external procedure or function of the high-level language.
汇编语言介于机器语言和高级语言间,相较于机器语言,其更易于程序员理解与编程,相较于高级语言,其具有更直接的机器相关性,达到了高速和高效的特点。在高级语言高度发展的今天,汇编语言通常被用在底层,进行程序优化或硬件操作。Assembly language is between machine language and high-level language. Compared with machine language, it is easier for programmers to understand and program. Compared with high-level language, it has more direct machine relevance and achieves the characteristics of high speed and efficiency. In today's highly developed high-level language, assembly language is usually used at the bottom level for program optimization or hardware operation.
利用高级语言和汇编语言编写的代码需要经过编译程序才能转换成机器码,以驱动计算机。The code written in high-level language and assembly language needs to be converted into machine code by a compiler to drive the computer.
人工智能芯片在运行时,需要大量访问内存,将数据在内存间搬动,以进行计算任务。例如,图像或语音信息转换成矩阵后,矩阵的数据会从片外内存复制到片上内存在进行计算。When the artificial intelligence chip is running, it needs to access a large amount of memory and move data between the memories to perform calculation tasks. For example, after the image or voice information is converted into a matrix, the data of the matrix will be copied from the off-chip memory to the on-chip memory for calculation.
本公开是针对专用集成电路(ASIC),特别是人工智能芯片,在编写代码阶段,不需要额外域或参数来定义访问地址空间的前提下,在编译时进行内存地址的推导,并且实现通用地址的访问,以达到精简计算资源、缩短计算时间的目的。This disclosure is aimed at application-specific integrated circuits (ASICs), especially artificial intelligence chips. In the code phase, no additional fields or parameters are required to define the access address space. The memory address is derived at compile time and the general address is realized. To achieve the purpose of streamlining computing resources and shortening computing time.
本公开的一个实施例是一种基于控制流图推导地址的方法,更详细来说是通过不动点算法实现基于控制流图指针推导方法。控制流图(Control Flow Graph,CFG)是用在编译器中的一种抽象数据结构,代表了一个程序可能会执行的所有路径,以流程图的形式反映过程内所有基本块的可能流向。An embodiment of the present disclosure is a method for deriving an address based on a control flow graph, and in more detail, a method for deriving an address based on a control flow graph is implemented by a fixed point algorithm. The Control Flow Graph (CFG) is an abstract data structure used in the compiler, which represents all the paths that a program may execute, and reflects the possible flow of all basic blocks in the process in the form of a flowchart.
控制流图是由节点和节点间的关系所组成的。节点又称为基本块(basic block,BB),是程序中最大限度顺序执行的语句序列,每个基本块只有一个入口和出口,执行时从其入口进入,从其出口退出。基本块的特点是只要第一条指令被执行了,那么基本块内所有指令都会按照顺序被执行。The control flow graph is composed of nodes and the relationships between nodes. A node is also called a basic block (basic block, BB), which is a sequence of statements executed in the maximum order in the program. Each basic block has only one entry and exit. During execution, it enters from its entry and exits from its exit. The characteristic of the basic block is that as long as the first instruction is executed, all the instructions in the basic block will be executed in order.
每个基本块包含至少一条指令,基本块中的指令可能使用指针(Pointer)指向特定的暂存器或内存。指针是一种变量,用以保存特定地址空间的地址。通过指针,程序员可以将数据载入到指针指向的特定地址的空间中,或是将指针指向的特定地址中的数据取出。Each basic block contains at least one instruction, and the instruction in the basic block may use a pointer to point to a specific scratchpad or memory. A pointer is a variable used to store an address in a specific address space. Through the pointer, the programmer can load data into the space of the specific address pointed to by the pointer, or fetch the data in the specific address pointed to by the pointer.
控制流图经常会出现谓词(predicate)、跳转、循环等条件情况。谓词是指一个委托,这个委托包括用于判断是否符合条件的方法函数。跳转指的是通过一个判断指令,使得流程出现分支,当符合条件时执行一种指令,不符合条件时执行另一种指令。循环则是在一定条件限制下,不断地执行同样的指令,直到条件满足才会停止。Control flow graphs often have conditions such as predicates, jumps, and loops. A predicate refers to a delegate, which includes a method function used to determine whether the condition is met. Jump refers to a judgment instruction that causes the process to branch. When a condition is met, one instruction is executed, and another instruction is executed when the condition is not met. The loop is to continuously execute the same instruction under certain conditions, and will not stop until the condition is met.
在系统存在多个内存时,而控制流图出现包括但不限于前述条件指令,除非考虑具体情况与条件进行比较,否则内存访问会出现不确定性。在不能确定指针对应至哪个内存的情况下,一般便将该指针设定为通用地址访问。此实施例的编译器通过对指针指向地址空间的推导,将原本设定为通用地址的指针明确化其访问的地址空间,一旦在编译时确定了,便可减少无谓的执行步骤,以优化程序的性能。When there are multiple memories in the system, the appearance of the control flow graph includes but is not limited to the aforementioned conditional instructions. Unless the specific situation and conditions are compared, the memory access will be uncertain. When it is not certain which memory the pointer corresponds to, the pointer is generally set to be accessed by a general address. By deriving the pointer to the address space, the compiler of this embodiment clarifies the address space accessed by the pointer originally set as a general address. Once it is determined at compile time, unnecessary execution steps can be reduced to optimize the program. Performance.
图1是示出此实施例的流程图。Fig. 1 is a flowchart showing this embodiment.
在步骤101中,编译器遍历多个基本块,获得指针的所有可能的地址空间。此实 施例遍历控制流图中的所有基本块,分别获得各个基本块中指令所使用指针指向的地址空间。针对同一个指针,获得它的所有可能的地址空间。In step 101, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. This embodiment traverses all the basic blocks in the control flow diagram, and obtains the address space pointed to by the pointer used by the instruction in each basic block. For the same pointer, get all its possible address spaces.
在遍历时,数据流的顺序可以采用逆后序的方式。以逆后序为例,其是先以后序方式遍历,再将结果倒置而得,逆后序可以更早的收敛。此实施例不限制遍历的顺序,但较佳的采用逆后序遍历。When traversing, the order of the data stream can be reversed. Take the reverse order as an example, which is obtained by traversing in a later order, and then inverting the result. The reverse order can converge earlier. This embodiment does not limit the order of traversal, but it is better to use reverse post-order traversal.
在步骤102中,编译器判断这些所有可能的地址空间是否推演至单一地址空间。编译器获得所述指针在基本块中的指针变量,此步骤在于判断这些指针在经历控制流图的流程后,是否能推演至单一的地址空间。In step 102, the compiler determines whether all these possible address spaces are deduced to a single address space. The compiler obtains the pointer variable of the pointer in the basic block. This step is to determine whether these pointers can be deduced to a single address space after undergoing the flow of the control flow graph.
可选地,编译器可以按照控制流图,依次获得各个基本块的所有前驱基本块中指令所使用指针的所有可能的地址空间,并确定这些前驱基本块中指令所使用指针的所有可能的地址空间能否推演至单一地址空间。其中,在控制流图中,当一基本块在另一基本块之前被执行时,则可以认为该基本块为另一基本块的前驱基本块。例如,图2中的第三基本块和第四基本块均为第五基本块的前驱基本块。Optionally, the compiler can sequentially obtain all possible address spaces of pointers used by instructions in all predecessor basic blocks of each basic block according to the control flow graph, and determine all possible addresses of pointers used by instructions in these predecessor basic blocks Whether the space can be deduced to a single address space. Among them, in the control flow diagram, when a basic block is executed before another basic block, the basic block can be regarded as a predecessor basic block of another basic block. For example, the third basic block and the fourth basic block in FIG. 2 are both predecessor basic blocks of the fifth basic block.
需特别注意的是,有些变量表面上虽然不同,但这些变量可能属于别名,也就是不同变量但均访问同一个地址空间,在此步骤中进行判断之前,编译器可以执行别名分析,找出具有别名的地址变量,以确定指针指向的变量的地址是否唯一。如果编译器不具备别名分析的功能,此实施例可以模拟别名分析的过程,针对每条指令的中间代码表示(Intermediate Representation,IR)上记录的指针指向的变量信息来进行判断。It is important to note that although some variables are different on the surface, these variables may belong to aliases, that is, different variables but all access the same address space. Before making judgments in this step, the compiler can perform alias analysis to find out The address variable of the alias is used to determine whether the address of the variable pointed to by the pointer is unique. If the compiler does not have the function of alias analysis, this embodiment can simulate the process of alias analysis, and judge the variable information pointed to by the pointer recorded on the Intermediate Representation (IR) of each instruction.
如推演至单一地址空间,则执行步骤103,编译器设定这指针访问所述地址空间。If it is deduced to a single address space, step 103 is executed, and the compiler sets this pointer to access the address space.
但如未推演至单一地址空间,则执行步骤104,编译器设定指针为通用地址访问。未推演至单一地址空间表示这指针在控制流图中可能存在多种可能性,在不同条件下会访问不同的地址空间,因此保持为通用地址访问。However, if it is not deduced to a single address space, step 104 is executed, and the compiler sets the pointer to general address access. Not deduced to a single address space means that this pointer may have multiple possibilities in the control flow graph, and it will access different address spaces under different conditions, so it remains as a general address access.
重复执行上述步骤101至步骤104,直至各个指针访问的地址空间不变,并记录各个指针的最终指向的地址空间。执行完前述步骤,编译器可以根据所述指针最终指向的地址空间设定生成访存指令,以根据所述访存指令实现图像、语音或文本等各种数据的存取操作。Repeat the above steps 101 to 104 until the address space accessed by each pointer remains unchanged, and record the address space eventually pointed to by each pointer. After performing the foregoing steps, the compiler can generate a memory access instruction according to the address space settings that the pointer ultimately points to, so as to implement various data access operations such as images, voices, or texts according to the memory access instructions.
图2是示出一种控制流图的范例,用以说明此实施例的流程图。此控制流图200包括五个基本块,分别为第一基本块201、第二基本块202、第三基本块203、第四基本块204及第五基本块205。第一基本块201为此控制流图200的入口,第一基本块201的出口连接至第二基本块202的入口,第二基本块202的出口同时连接至第三基本块203及第四基本块204的入口,这表示第二基本块202的出口存在一个判断式,例如跳转,满足特定条件跳转至第三基本块203,不满足该特定条件则跳转至第四基本块204,而第三基本块203的出口及第四基本块204的出口均连接至第五基本块205的入口,第五基本块205的出口为整个控制流图200的出口,且又连接回第二基本块202的入口。FIG. 2 is an example of a control flow diagram to illustrate the flow chart of this embodiment. This control flow diagram 200 includes five basic blocks, namely a first basic block 201, a second basic block 202, a third basic block 203, a fourth basic block 204, and a fifth basic block 205. The first basic block 201 is the entrance of the control flow diagram 200, the exit of the first basic block 201 is connected to the entrance of the second basic block 202, and the exit of the second basic block 202 is connected to the third basic block 203 and the fourth basic block at the same time. The entry of block 204, which means that there is a judgment formula at the exit of the second basic block 202, for example, jump, jump to the third basic block 203 if a specific condition is met, and jump to the fourth basic block 204 if the specific condition is not met, The exit of the third basic block 203 and the exit of the fourth basic block 204 are both connected to the entrance of the fifth basic block 205. The exit of the fifth basic block 205 is the exit of the entire control flow diagram 200, and is connected back to the second basic block. Entrance to block 202.
在步骤101中,编译器遍历控制流图200的所有基本块,获得各个基本块中指针的所有可能的地址空间。为说明方便,在此仅就控制流图200中的一个指针p进行示例。在遍历控制流图200中的所有基本块后,针对指针p,获得它的所有可能的地址空间如下:在第一基本块201中,指针p未明确指向,故为通用地址访问;在第二基 本块202及第三基本块203中,把a赋值给指针p;在第四基本块204中,把b赋值给指针p;在第五基本块205中,变量c为自指针p所指向的地址取出来的值。总结来说,指针p可能为变量a或b,直接影响变量c的数值。In step 101, the compiler traverses all the basic blocks of the control flow graph 200 to obtain all possible address spaces of the pointers in each basic block. For the convenience of description, only one pointer p in the control flow graph 200 is used as an example here. After traversing all the basic blocks in the control flow graph 200, for the pointer p, all possible address spaces to obtain it are as follows: In the first basic block 201, the pointer p does not point to it explicitly, so it is a general address access; In the basic block 202 and the third basic block 203, a is assigned to the pointer p; in the fourth basic block 204, b is assigned to the pointer p; in the fifth basic block 205, the variable c is pointed to by the self-pointer p The value retrieved from the address. In summary, the pointer p may be variable a or b, which directly affects the value of variable c.
在步骤102中,编译器可以按照控制流图依次判断各个基本块的所有前驱基本块中指针的所有可能的地址空间是否推演至单一地址空间。在一种情况下,如果变量a和变量b所对应的地址空间相同,例如都在片上内存,则执行步骤103,编译器设定指针p访问具体的地址空间,即片上内存。在第五基本块205的出口处,指针p指向片上内存,这样的设定依循控制流图200的回路回到第二基本块202的入口,此时指针p更新为访问片上内存,接着继续根据控制流图200执行图1的流程进行迭代,直到指针p指向的地址空间不再发生变化为止。In step 102, the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph. In one case, if the address spaces corresponding to the variable a and the variable b are the same, for example, both are in the on-chip memory, step 103 is executed, and the compiler sets the pointer p to access a specific address space, that is, the on-chip memory. At the exit of the fifth basic block 205, the pointer p points to the on-chip memory. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202. At this time, the pointer p is updated to access the on-chip memory, and then continues according to The control flow graph 200 executes the process of FIG. 1 to iterate until the address space pointed to by the pointer p no longer changes.
当指针p的指向的地址空间不再发生变化时,编译器可以根据所述指针指向的地址空间生成访存指令,以根据所述访存指令实现图像、语音或文本等各种数据的存取操作。When the address space pointed to by the pointer p no longer changes, the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement access to various data such as images, voice, or text according to the memory access instruction operate.
在另一种情况下,如果变量a和变量b所对应的地址空间不同,例如变量a存储在片上内存,而变量b存储在片外内存,未对应至同一个地址空间,则执行步骤104,编译器设定指针p为通用地址访问。在第五基本块205的出口处,也就是在所述多个基本块汇合后,指针p更新为通用地址访问,这样的设定依循控制流图200的回路回到第二基本块202的入口,继续迭代,直到指针p访问的地址空间不再发生变化为止。In another case, if the address spaces corresponding to variable a and variable b are different, for example, variable a is stored in on-chip memory, and variable b is stored in off-chip memory, and does not correspond to the same address space, then step 104 is executed. The compiler sets the pointer p for general address access. At the exit of the fifth basic block 205, that is, after the multiple basic blocks are merged, the pointer p is updated to a universal address access. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202 , Continue to iterate until the address space accessed by the pointer p no longer changes.
图3是示出控制流图的另一个范例,同样用以说明此实施例的流程图。此控制流图300同样包括五个基本块,且这些基本块的连接关系与控制流图200相同,不同处在于各基本块中与指针p有关的指令。FIG. 3 is another example of a control flow diagram, and is also used to illustrate a flowchart of this embodiment. The control flow diagram 300 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow diagram 200, and the difference lies in the instructions related to the pointer p in each basic block.
在步骤101中,编译器遍历控制流图300的所有基本块,获得各个基本块中指针的所有可能的地址空间。针对指针p,在遍历控制流图300中的所有基本块后,获得它的所有可能指向如下:在第一基本块301中,变量a赋值给指针p;在第二基本块302中,变量c为自指针p所指向的地址取出来的值,而地址便是变量a的值;在第四基本块304中,指针p的指向多了偏移量b。In step 101, the compiler traverses all the basic blocks of the control flow graph 300, and obtains all possible address spaces of the pointers in each basic block. For the pointer p, after traversing all the basic blocks in the control flow graph 300, all possible points to it are obtained as follows: in the first basic block 301, the variable a is assigned to the pointer p; in the second basic block 302, the variable c It is the value fetched from the address pointed to by the pointer p, and the address is the value of the variable a; in the fourth basic block 304, the pointer p points to an additional offset b.
在步骤102中,编译器可以按照控制流图依次判断各个基本块的所有前驱基本块中指针的所有可能的地址空间是否推演至单一地址空间。在此范例中,指针p均指向存储变量a的地址空间,例如都在片上内存,虽然在第四基本块304中,具体地址产生了偏移量b,但仍然位于同样的地址空间上,故第五基本块的前驱基本块中指针的所有可能的地址推演至单一地址空间,也就是片上内存,则执行步骤103,编译器设定指针p访问片上内存。在第五基本块305的出口处,指针p更新为指向片上内存,这样的设定依循控制流图300的回路回到第二基本块302的入口进行迭代,直到指针p访问的地址空间不再发生变化为止。在这个例子中,指针p不会再发生变化,可以确定访问片上内存。In step 102, the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph. In this example, the pointer p points to the address space where the variable a is stored, for example, they are all in the on-chip memory. Although the specific address has an offset b in the fourth basic block 304, it is still located in the same address space, so All possible addresses of the pointer in the predecessor basic block of the fifth basic block are deduced to a single address space, that is, the on-chip memory, then step 103 is executed, and the compiler sets the pointer p to access the on-chip memory. At the exit of the fifth basic block 305, the pointer p is updated to point to the on-chip memory. This setting follows the loop of the control flow graph 300 and returns to the entry of the second basic block 302 to iterate until the address space accessed by the pointer p is no longer Until the change occurs. In this example, the pointer p will no longer change, and the on-chip memory can be accessed.
控制流图的条件指令还包括函数调用。函数调用是调用子程序,遇到函数调用时会跳到子程序中去执行,执行完子程序后再回到主程序中执行下个指令。本公开的另一个实施例是一种适用函数调用的地址空间推导方法,其流程图如图4所示。The conditional instructions of the control flow graph also include function calls. Function call is to call a subroutine. When a function call is encountered, it will jump to the subroutine for execution. After the subroutine is executed, it will return to the main program to execute the next instruction. Another embodiment of the present disclosure is an address space derivation method suitable for function calls, and the flowchart is shown in FIG. 4.
在步骤401中,编译器遍历多个基本块,获得指针的所有可能的地址空间。此实施例遍历控制流图中的所有基本块,针对同一个指针,获得它的所有可能的地址空间。In step 401, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. This embodiment traverses all the basic blocks in the control flow graph, and obtains all its possible address spaces for the same pointer.
在步骤402中,编译器判断所述指针是否涉及函数调用。如果涉及函数调用,则执行步骤403,编译器判断该函数是否为只读。如不为只读函数,则在编译阶段无法确认函数调用的结果,因此执行步骤404,编译器设定指针为通用地址访问。In step 402, the compiler determines whether the pointer involves a function call. If a function call is involved, step 403 is executed, and the compiler judges whether the function is read-only. If it is not a read-only function, the result of the function call cannot be confirmed during the compiling stage, so step 404 is executed, and the compiler sets the pointer to general address access.
在步骤403中,如判断函数为只读函数,由于只读函数并不会改变访问的地址空间,或是在步骤402中,判断不涉及函数调用,则执行步骤405,编译器判断这些所有可能的地址空间是否推演至单一地址空间。如未推演至单一地址空间,则执行步骤404,编译器设定指针为通用地址访问。如推演至单一地址空间,则执行步骤406,编译器设定这指针访问所述地址空间。In step 403, if the function is judged to be a read-only function, since the read-only function does not change the address space accessed, or in step 402, it is judged that no function call is involved, then step 405 is executed, and the compiler judges all these possible Whether the address space of is deduced to a single address space. If it is not deduced to a single address space, step 404 is executed, and the compiler sets the pointer to general address access. If it is deduced to a single address space, step 406 is executed, and the compiler sets this pointer to access the address space.
在另一个实施例中,在执行步骤402时,当编译器判断所述指针涉及函数调用,可以不执行步骤403,直接执行步骤404,编译器设定指针为通用地址访问。In another embodiment, when step 402 is executed, when the compiler determines that the pointer involves a function call, step 403 may not be executed, and step 404 may be executed directly, and the compiler sets the pointer to general address access.
编译器可以重复执行上述步骤401至步骤406,直至指针指向的地址空间不变,之后编译器可以根据所述指针指向的地址空间生成访存指令,以根据所述访存指令实现图像、语音或文本等各种数据的存取操作。The compiler can repeat the above steps 401 to 406 until the address space pointed to by the pointer remains unchanged, and then the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement image, voice, or video access according to the memory access instruction. Access operations of various data such as text.
图5是示出控制流图的另一个范例,用以说明此实施例的流程图。此控制流图500同样包括五个基本块,且这些基本块的连接关系与控制流图300相同,唯一不同处在于第五基本块305变成了函数调用505。FIG. 5 is another example of a control flow diagram to illustrate the flow chart of this embodiment. This control flow graph 500 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow graph 300, with the only difference that the fifth basic block 305 becomes a function call 505.
在步骤401中,编译器遍历多个基本块,获得指针的所有可能的地址空间。在第一基本块501中,变量a赋值给指针p,假设变量a存储在特定芯片的内存,则指针p应访问该特定芯片的内存。控制流经过第二基本块502、第三基本块503及第四基本块504皆未改变指针p访问的地址空间,因此指针p的所有可能的地址空间就是该特定芯片的内存。In step 401, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. In the first basic block 501, the variable a is assigned to the pointer p. Assuming that the variable a is stored in the memory of a specific chip, the pointer p should access the memory of the specific chip. The control flow through the second basic block 502, the third basic block 503, and the fourth basic block 504 does not change the address space accessed by the pointer p. Therefore, all possible address spaces of the pointer p are the memory of the specific chip.
在步骤402中,编译器判断所述指针是否涉及函数调用。由于此控制流图500涉及函数调用505,故执行步骤403,编译器判断该函数是否为只读。假设函数调用505不为只读函数,则执行步骤404,编译器设定指针p为通用地址访问。换言之,在函数调用505的出口处,指针p更新为通用地址访问,这样的设定依循控制流图500的回路回到第二基本块502的入口进行迭代,直到指针p访问的地址空间不再发生变化。在这个例子中,指针p在编译时将设定为通用地址访问。In step 402, the compiler determines whether the pointer involves a function call. Since this control flow graph 500 involves a function call 505, step 403 is executed, and the compiler judges whether the function is read-only. Assuming that the function call 505 is not a read-only function, step 404 is executed, and the compiler sets the pointer p as a general address access. In other words, at the exit of the function call 505, the pointer p is updated to the general address access. This setting follows the loop of the control flow graph 500 and returns to the entry of the second basic block 502 to iterate until the address space accessed by the pointer p is no longer Changes. In this example, the pointer p will be set to general address access at compile time.
前述实施例在编译阶段,通过地址空间推导,将可以推演至访问单一地址空间的指针明确化,减少了程序运行的时间,进而优化程序的性能。In the foregoing embodiment, in the compilation stage, the address space derivation is used to clarify the pointer that can be deduced to access a single address space, which reduces the program running time and optimizes the performance of the program.
当指针被设定为通用地址访问时,本公开还提供一种不需要设置额外域或参数,只需要在定义变量的时候,显式地声明变量所在的地址空间,其存取操作由通用地址访问机制来完成的方法。更详细来说,由于人工智能芯片没有内存管理机制,因此遇到通用地址访问无法直接获得具体内存的信息。本公开的另一个实施例是声明变量所在的地址空间的通用地址访问方法,其采用软件模拟硬件的方式,以简化硬件设计复杂度。When the pointer is set to a general address access, the present disclosure also provides a method that does not need to set additional fields or parameters. It only needs to explicitly declare the address space where the variable is located when the variable is defined. The access operation is determined by the general address. Access mechanism to complete the method. In more detail, because artificial intelligence chips do not have a memory management mechanism, they cannot directly obtain specific memory information when encountering general address access. Another embodiment of the present disclosure is a general address access method for declaring the address space where the variable is located, which adopts a way of simulating hardware by software to simplify the complexity of hardware design.
此实施例的应用场景是一种人工智能芯片,其包括第一内存及第二内存,第一内存的存储空间是由第一地址至第二地址所定义,第二内存的存储空间是由第三地址至第四地址所定义,其中第一内存可以是片外内存,第二内存可以是片上内存。再者,片外内存与片上内存的地址是接续编排的,例如第一内存共有128个存储空间,分别由第一地址addr0及第二地址addr127间的128个地址来指向,第二内存亦有128个 存储空间,接续由第三地址addr128及第四地址addr255间的128个地址来指向,也就是说,虽然第一内存与第二内存不在同一处,但第三地址addr128为第二地址addr127加一。在此实施例中,基本块内的指针已经过如图1或图4的流程被设定为通用地址访问。此实施例的流程图如图6所示。The application scenario of this embodiment is an artificial intelligence chip that includes a first memory and a second memory. The storage space of the first memory is defined by the first address to the second address, and the storage space of the second memory is defined by the first address to the second address. Defined by the third address to the fourth address, where the first memory can be an off-chip memory, and the second memory can be an on-chip memory. Furthermore, the addresses of off-chip memory and on-chip memory are arranged consecutively. For example, the first memory has a total of 128 storage spaces, which are respectively pointed to by 128 addresses between the first address addr0 and the second address addr127. The second memory also has 128 storage spaces, followed by 128 addresses between the third address addr128 and the fourth address addr255, that is to say, although the first memory and the second memory are not in the same place, the third address addr128 is the second address addr127 plus one. In this embodiment, the pointer in the basic block has been set as a general address access through the process shown in FIG. 1 or FIG. 4. The flowchart of this embodiment is shown in FIG. 6.
在步骤601中,编译器判断所述通用地址是否落在第一地址与第二地址间。由于第一内存与第二内存的地址是接续编排的,所以只要判断通用地址是否小于第三地址,便可知道通用地址是否落在第一地址与第二地址间。如落在第一地址与第二地址间,则执行步骤602,编译器设定第一变量为真。如未落在第一地址与第二地址间,则执行步骤603,编译器设定第一变量为假。In step 601, the compiler determines whether the universal address falls between the first address and the second address. Since the addresses of the first memory and the second memory are arranged consecutively, as long as it is judged whether the general address is smaller than the third address, it can be known whether the general address falls between the first address and the second address. If it falls between the first address and the second address, step 602 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 603 is executed, and the compiler sets the first variable to false.
接着在步骤604中,编译器判断第一变量是否为真。如第一变量为真,表示所述指针的地址是在第一内存中,故执行步骤605,编译器设定所述指针访问第一内存。如第一变量为假,表示所述指针的地址是在第二内存中,故执行步骤606,编译器设定所述指针访问第二内存。Then in step 604, the compiler determines whether the first variable is true. If the first variable is true, it means that the address of the pointer is in the first memory, so step 605 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, it means that the address of the pointer is in the second memory, so step 606 is executed, and the compiler sets the pointer to access the second memory.
执行完前述步骤,编译器可以根据所述设定生成访存指令,以根据所述访存指令实现图像、语音或文本等各种数据的存取操作。After performing the foregoing steps, the compiler can generate a memory access instruction according to the setting, so as to implement various data access operations such as image, voice, or text according to the memory access instruction.
此实施例利用指针的变量数值与内存的地址区间做计较,以取得该地址位于哪个内存的信息,进而在编译阶段确定指针应该访问的地址空间。In this embodiment, the variable value of the pointer is calculated with the address range of the memory to obtain information about which memory the address is located in, and then the address space that the pointer should access is determined in the compilation stage.
在执行前述流程时,可以利用谓词编写代码来实现,其中一种可操作的代码如下所示:In the execution of the aforementioned process, you can use predicate to write code to achieve, one of the operable codes is as follows:
setp.lt s%addr 0xXXXXX       (1)setp.lt s%addr 0xXXXXX (1)
@s ld.offchip         (2)@s ld.offchip (2)
!@s ld.onchip        (3)! @sld.onchip (3)
其中,setp.lt为小于运算的判断指令,s为谓词寄存器(即前述的第一变量),%addr为指针p的数值,也就是地址,0xXXXXX代表数值128,@s是判断式,判断s是否为真,!@s也是判断式,判断s是否为假,ld.offchip为自片外内存加载,ld.onchip为自片上内存加载。Among them, setp.lt is the judgment instruction for the less than operation, s is the predicate register (that is, the aforementioned first variable), %addr is the value of the pointer p, which is the address, 0xXXXXX represents the value 128, @s is the judgment formula, and judgment s Is it true? @s is also a judgment formula to determine whether s is false, ld.offchip is loaded from off-chip memory, and ld.onchip is loaded from on-chip memory.
第(1)条指令代表:判断指针p的数值是否小于128,如是,设定s为1,也就是为真,如否,设定s为0,也就是为假。第(2)条指令代表:如果s为真,则自片外内存加载数据。第(3)条指令代表:如果s为假,则自片上内存加载数据。不论从片外内存或是片上内存加载数据,其地址就是指针p的数值%addr。The command (1) represents: to determine whether the value of the pointer p is less than 128, if yes, set s to 1, which is true, if not, set s to 0, which is false. The instruction (2) means: if s is true, load data from off-chip memory. The instruction (3) represents: if s is false, load data from the on-chip memory. Regardless of loading data from off-chip memory or on-chip memory, its address is the value %addr of the pointer p.
虽然此范例是以加载(load)指令做说明,但本发明不限于指令的类型,任何需要访问内存的指令皆适用上述的流程。Although this example is described with a load instruction, the present invention is not limited to the type of instruction, and any instruction that needs to access the memory is applicable to the above process.
此实施例不需要增加额外域或参数,仅通过判断谓词暂存器里的数值为真或为假,便可在编译阶段获得指针p所指向的地址位于哪个内存的信息,建立起通用地址的访问机制。This embodiment does not need to add additional fields or parameters. Only by judging whether the value in the predicate register is true or false, the information on which memory the address pointed to by the pointer p is located can be obtained at the compilation stage, and the general address information can be established Access mechanism.
当系统内存多于两个时,同样可以用本发明的方法找出对应的地址空间。本发明的另一个实施例是在三个内存中具体化通用地址访问的方法,在此实施例中,一个控制流图中的多个基本块可以访问第一内存、第二内存及第三内存,第一内存的存储空间是由第一地址至第二地址所定义,第二内存的存储空间是由第三地址至第四地址所定义,第三内存的存储空间是由第五地址至第六地址所定义,其中第一内存及第二内 存可以是片外不同内存,第三内存是片上内存。同样地,这些内存的地址是接续编排的,例如:第一内存共有128个存储空间,分别由第一地址addr0及第二地址addr127间的128个地址来表示,第二内存亦有128个存储空间,接续由第三地址addr128及第四地址addr255间的128个地址来表示,第三内存亦有128个存储空间,接续由第五地址addr256及第六地址addr383间的128个地址来表示,换言之,第三地址addr128为第二地址addr127加一,第五地址addr256为第四地址addr255加一。此实施例的流程图如图7所示。When the system memory is more than two, the method of the present invention can also be used to find the corresponding address space. Another embodiment of the present invention is a method for specifying general address access in three memories. In this embodiment, multiple basic blocks in a control flow graph can access the first memory, the second memory, and the third memory. , The storage space of the first memory is defined by the first address to the second address, the storage space of the second memory is defined by the third address to the fourth address, and the storage space of the third memory is defined by the fifth address to the first address. Defined by six addresses, where the first memory and the second memory can be different off-chip memories, and the third memory is on-chip memory. Similarly, the addresses of these memories are arranged sequentially. For example, the first memory has 128 storage spaces, which are represented by 128 addresses between the first address addr0 and the second address addr127, and the second memory also has 128 storage spaces. Space, the connection is represented by 128 addresses between the third address addr128 and the fourth address addr255, the third memory also has 128 storage spaces, and the connection is represented by 128 addresses between the fifth address addr256 and the sixth address addr383, In other words, the third address addr128 is the second address addr127 plus one, and the fifth address addr256 is the fourth address addr255 plus one. The flowchart of this embodiment is shown in FIG. 7.
在步骤701中,编译器判断所述通用地址是否落在第一地址与第二地址间。也就是判断通用地址是否小于第三地址。如落在第一地址与第二地址间,则执行步骤702,编译器设定第一变量为真。如未落在第一地址与第二地址间,则执行步骤703,编译器设定第一变量为假。In step 701, the compiler determines whether the universal address falls between the first address and the second address. That is, it is judged whether the general address is smaller than the third address. If it falls between the first address and the second address, step 702 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 703 is executed, and the compiler sets the first variable to false.
接着执行步骤704,编译器判断所述通用地址是否落在第五地址与第六地址间。也就是判断通用地址是否大于第四地址。如落在第五地址与第六地址间,则执行步骤705,编译器设定第二变量为真。如未落在第五地址与第六地址间,则执行步骤706,编译器设定第二变量为假。Next, step 704 is executed, and the compiler determines whether the universal address falls between the fifth address and the sixth address. That is, it is judged whether the general address is greater than the fourth address. If it falls between the fifth address and the sixth address, step 705 is executed, and the compiler sets the second variable as true. If it does not fall between the fifth address and the sixth address, step 706 is executed, and the compiler sets the second variable to false.
接着在步骤707中,编译器判断第一变量是否为真。如第一变量为真,表示所述指针的地址是位于第一内存中,故执行步骤708,编译器设定所述指针访问第一内存。如第一变量为假,则执行步骤709,编译器判断第二变量是否为真。如第二变量为真,表示所述指针的地址位于第三内存中,故执行步骤710,编译器设定所述指针访问第三内存。如第二变量为假,则执行步骤711中,编译器判断第一变量及第二变量是否皆为假,如是,表示该地址不在第一内存和第三内存中,则执行步骤712,编译器设定所述指针访问第二内存。Then in step 707, the compiler determines whether the first variable is true. If the first variable is true, it indicates that the address of the pointer is located in the first memory, so step 708 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, step 709 is executed, and the compiler judges whether the second variable is true. If the second variable is true, it means that the address of the pointer is located in the third memory, so step 710 is executed, and the compiler sets the pointer to access the third memory. If the second variable is false, then in step 711, the compiler judges whether the first variable and the second variable are both false, if yes, it means that the address is not in the first memory and the third memory, then step 712 is executed, the compiler The pointer is set to access the second memory.
就逻辑上来说,由于地址肯定在第一内存、第二内存或第三内存中,故步骤707、步骤709及步骤711的判断步骤至少有一个会判断为是,换言之在步骤711中理应不会有第一变量或第二变量不为假的情况发生,如真的发生,则此流程会回到步骤707,编译器重新判断第一变量及第二变量的真假。Logically speaking, since the address must be in the first memory, the second memory, or the third memory, at least one of the judgment steps of step 707, step 709, and step 711 will be judged as yes. In other words, it should not be judged as yes in step 711. There is a situation where the first variable or the second variable is not false. If it does occur, the process will return to step 707, and the compiler will re-judge whether the first variable and the second variable are true or false.
执行完前述步骤,编译器可以根据所述设定生成访存指令,以根据所述访存指令实现图像、语音或文本等各种数据的存取操作。After performing the foregoing steps, the compiler can generate a memory access instruction according to the setting, so as to implement various data access operations such as image, voice, or text according to the memory access instruction.
从上述流程可知,此实施例将指针的变量数值与地址区间做计较,便可知道该地址位于哪个内存,再根据地址存取数值。It can be seen from the above process that in this embodiment, the variable value of the pointer is calculated with the address range, and then the memory where the address is located can be known, and then the value can be accessed according to the address.
图7的流程可以由以下代码来实现:The process of Figure 7 can be implemented by the following code:
setp.lt s%addr 0xXXXXX       (4)setp.lt s%addr 0xXXXXX (4)
setp.gt t%addr 0xYYYYY       (5)setp.gt t%addr 0xYYYYY (5)
@s ld.chip1         (6)@sld.chip1 (6)
@t ld.chip          (7)@tld.chip (7)
!@s&!@t ld.chip2         (8)! @s&! @tld.chip2 (8)
第(4)条指令代表:判断指针p的数值%addr是否小于128,如是,设定s谓词暂存器(即第一变量)的值为1,如否,则设定为0。第(5)条指令代表:判断指针p的数值%addr是否小于256(0xYYYYY),如是,设定t谓词暂存器(即第二变量)的值为1,如否,则设定为0。第(6)条指令代表:如果s为真(即s=1),则自第一内存加载数据。第(7)条指令代表:如果t为真(即t=1),则自第三内存加载数据。第(8) 条指令代表:如果s为假(即s=0)且t为假(即t=0),则自第二内存加载数据。The instruction (4) represents: judge whether the value %addr of the pointer p is less than 128, if yes, set the value of the s predicate register (ie, the first variable) to 1, if not, set it to 0. The command (5) represents: judge whether the value %addr of the pointer p is less than 256 (0xYYYYY), if yes, set the value of the t predicate register (that is, the second variable) to 1, if not, set it to 0 . The instruction (6) represents: if s is true (ie, s=1), load data from the first memory. The instruction (7) represents: if t is true (that is, t=1), load data from the third memory. The instruction (8) represents: if s is false (ie, s=0) and t is false (ie, t=0), load data from the second memory.
第(8)条指令涉及谓语的运算,但不是所有的编译器都能支持谓语的运算,在编译器无法支持的情况下,可以采用谓词赋值的方式来实现等同效果。本发明的另一个实施例为利用谓词赋值来实现在三个内存中判断通用地址的具体地址空间的方法。图8是示出此实施例的流程图,其中步骤801至810分别对应至图7的步骤701至710,不再赘述。The instruction (8) involves the operation of the predicate, but not all compilers can support the operation of the predicate. If the compiler cannot support it, the predicate assignment can be used to achieve the equivalent effect. Another embodiment of the present invention is to use predicate assignment to realize a method for judging the specific address space of the general address in three memories. FIG. 8 is a flowchart showing this embodiment, in which steps 801 to 810 correspond to steps 701 to 710 in FIG. 7 respectively, and will not be repeated.
当在步骤807中判断第一变量为真时,在步骤808后执行步骤811,编译器设定第三变量为假。同样地,当在步骤809中判断第二变量为真时,在步骤810后执行步骤811,编译器设定第三变量为假。当在步骤809中判断第二变量不为真时,则执行步骤812,编译器设定第三变量为真。在步骤811及步骤812后,执行步骤813,编译器判断第三变量是否为真。如果为真,表示第一及第二变量皆为假,执行步骤814,编译器设定指针访问第二内存。如果为假,表示流程经过了步骤808或步骤810,指针已被设定访问第一内存或第二内存了,故在步骤815中结束流程。When it is determined in step 807 that the first variable is true, step 811 is executed after step 808, and the compiler sets the third variable to false. Similarly, when the second variable is judged to be true in step 809, step 811 is executed after step 810, and the compiler sets the third variable to false. When it is determined in step 809 that the second variable is not true, step 812 is executed, and the compiler sets the third variable to be true. After step 811 and step 812, step 813 is executed, and the compiler determines whether the third variable is true. If it is true, it means that both the first and second variables are false, and step 814 is executed, and the compiler sets the pointer to access the second memory. If it is false, it means that the process has passed step 808 or step 810, and the pointer has been set to access the first memory or the second memory, so the process ends in step 815.
执行完前述步骤,在编译完成后,根据所述设定,计算图像或语音数据。After performing the foregoing steps, after the compilation is completed, the image or voice data is calculated according to the settings.
第(8)条指令改以谓词赋值方式表示时,可以用以下四条指令来完成:When the instruction (8) is changed to the predicate assignment method, it can be completed with the following four instructions:
u=1         (9)u = 1 (9)
@!s u=0         (10)@! s u = 0 (10)
@!t u=0         (11)@! t u = 0 (11)
@u ld.chip1        (12)@u ld.chip1 (12)
第(9)条指令代表:设定第三变量u为真;第(10)条指令代表:如果s不为假,则设定第三变量为假;第(11)条指令代表:如果t不为假,则设定第三变量为假;第(12)条指令代表:如果第三变量为真,则加载第二内存。The command (9) represents: set the third variable u to be true; the command (10) represents: if s is not false, then the third variable is set to false; the command (11) represents: if t If it is not false, set the third variable to false; the instruction (12) means: if the third variable is true, load the second memory.
此实施例中的三个内存仅为示例,本领域技术人员可以在未有创造性劳动的前提下,将本发明应用至超过三个内存的场景,这些场景均在本发明所揭露的范畴中。The three memories in this embodiment are only examples, and those skilled in the art can apply the present invention to scenarios with more than three memories without creative work, and these scenarios are all within the scope disclosed by the present invention.
本发明的另一个实施例是一种人工智能芯片的计算装置。图9示出了此一种计算装置900的内部结构示意图。计算装置900共有十六个处理器核(处理器核0至处理器核15),用于执行矩阵计算任务,每四个处理器核组成一个处理单元组,也就是集群(cluster)。更详细来说,处理器核0至处理器核3组成第一集群902,处理器核4至处理器核7组成第二集群904,处理器核8至处理器核11组成第三集群906,处理器核12至处理器核15组成第四集群908。计算装置130基本上是以集群为单元执行计算任务。Another embodiment of the present invention is an artificial intelligence chip computing device. FIG. 9 shows a schematic diagram of the internal structure of such a computing device 900. The computing device 900 has a total of sixteen processor cores (processor core 0 to processor core 15) for performing matrix calculation tasks, and every four processor cores form a processing unit group, that is, a cluster. In more detail, processor core 0 to processor core 3 form a first cluster 902, processor core 4 to processor core 7 form a second cluster 904, and processor core 8 to processor core 11 form a third cluster 906. The processor core 12 to the processor core 15 form a fourth cluster 908. The computing device 130 basically uses a cluster as a unit to perform computing tasks.
计算装置900还包括存储单元核910及共享存储单元912。存储单元核910主要用于控制数据交换,作为计算装置900与片外内存沟通的渠道。共享存储单元912是一种片上内存,用以暂存这些集群902、904、906、908的计算中间值。The computing device 900 also includes a storage unit core 910 and a shared storage unit 912. The storage unit core 910 is mainly used to control data exchange and serves as a communication channel between the computing device 900 and the off-chip memory. The shared storage unit 912 is an on-chip memory for temporarily storing the calculated intermediate values of the clusters 902, 904, 906, and 908.
处理器核0至处理器核15用以执行前述各实施例的方法,具体而言是包括但不限于图1、图4、图6、图7及图8的流程。The processor core 0 to the processor core 15 are used to execute the methods of the foregoing embodiments, specifically including but not limited to the processes of FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8.
图10是示出根据本公开实施例的一种集成电路装置1000的结构图。如图10所示,集成电路装置1000包括计算装置900、通用互联接口1004和其他处理装置1006。FIG. 10 is a structural diagram showing an integrated circuit device 1000 according to an embodiment of the present disclosure. As shown in FIG. 10, the integrated circuit device 1000 includes a computing device 900, a universal interconnect interface 1004, and other processing devices 1006.
通用互联接口1004可以用于在计算装置900与其他处理装置1006间传输数据和控制指令。例如,计算装置900可以经由通用互联接口1004从其他处理装置1006中获取所需的输入数据,写入计算装置900片上的共享存储单元912。进一步,计算 装置900可以经由通用互联接口1004从其他处理装置1006中获取控制指令,写入计算装置900片上的控制缓存。The universal interconnect interface 1004 can be used to transmit data and control commands between the computing device 900 and other processing devices 1006. For example, the computing device 900 may obtain required input data from other processing devices 1006 via the universal interconnect interface 1004, and write the required input data to the shared storage unit 912 on the computing device 900 chip. Further, the computing device 900 can obtain control instructions from other processing devices 1006 via the universal interconnect interface 1004, and write them into the on-chip control buffer of the computing device 900.
其他处理装置1006可以是中央处理器、图形处理器、人工智能处理器等通用和/或专用处理器中的一种或多种类型的处理器,其数目不做限制而是依实际需要来确定。其他处理装置1006作为计算装置900与外部数据和控制的接口,执行包括但不限于数据搬运,完成对计算装置900的开启、停止等的基本控制。其他处理装置1006也可以和计算装置900协作共同完成运算任务。The other processing device 1006 may be one or more types of processors in general and/or special-purpose processors such as central processing unit, graphics processor, artificial intelligence processor, etc., the number of which is not limited but determined according to actual needs. . The other processing device 1006 serves as an interface between the computing device 900 and external data and control, performs basic control including but not limited to data transfer, and completes the starting and stopping of the computing device 900. The other processing device 1006 can also cooperate with the computing device 900 to complete computing tasks.
集成电路装置1000还包括片外内存1008,其可以分别与计算装置900和其他处理装置1006连接。片外内存1008用于保存计算装置900和其他处理装置1006的数据,尤其适用于所需要运算的数据在计算装置900或其他处理装置1006的内部存储中无法全部保存的数据。The integrated circuit device 1000 further includes an off-chip memory 1008, which can be connected to the computing device 900 and other processing devices 1006, respectively. The off-chip memory 1008 is used to store data of the computing device 900 and other processing devices 1006, and is especially suitable for data that cannot be fully stored in the internal storage of the computing device 900 or other processing devices 1006 for the data that needs to be calculated.
根据应用场景的不同,集成电路装置1000可以作为手机、机器人、无人机、视频采集等设备的片上系统(SOC),从而有效地降低控制部分的核心面积,提高处理速度并降低整体的功耗。在此情况时,集成电路装置1000的通用互联接口1004与设备的某些部件相连接。此处的某些部件可以例如是摄像头,显示器,鼠标,键盘,网卡或wifi接口。According to different application scenarios, the integrated circuit device 1000 can be used as a system on chip (SOC) for mobile phones, robots, drones, video capture and other equipment, thereby effectively reducing the core area of the control part, increasing processing speed and reducing overall power consumption . In this case, the universal interconnect interface 1004 of the integrated circuit device 1000 is connected to certain components of the device. Some components here can be, for example, a camera, a monitor, a mouse, a keyboard, a network card or a wifi interface.
本公开还揭露一种芯片或集成电路芯片,其包括了集成电路装置1000。本公开还揭露一种芯片封装结构,其包括了上述芯片。The present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1000. The present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
本公开另一个实施例是一种板卡,其包括了上述芯片封装结构。参阅图11,板卡1100除了包括多个上述芯片1102以外,还可以包括其他的配套部件,该配套部件包括存储器件1104、接口装置1106和控制器件1108。Another embodiment of the present disclosure is a board card, which includes the chip packaging structure described above. Referring to FIG. 11, in addition to a plurality of the aforementioned chips 1102, the board 1100 may also include other supporting components. The supporting components include a storage device 1104, an interface device 1106, and a control device 1108.
存储器件1104与芯片封装结构内的芯片1102通过总线1114连接,用于存储数据。存储器件1104可以包括多组存储单元1110。每一组存储单元1110可以是前述的片外内存。The storage device 1104 is connected to the chip 1102 in the chip packaging structure through a bus 1114 for storing data. The storage device 1104 may include multiple groups of storage units 1110. Each group of storage units 1110 may be the aforementioned off-chip memory.
接口装置1106与所述芯片封装结构内的芯片1102电连接。所述接口装置1106用于实现芯片1102与外部设备1112(例如服务器或计算机)之间的数据传输。在此实施例中,接口装置1106为标准PCIe接口,待处理的数据由服务器通过标准PCIe接口传递至芯片1102,实现数据转移。芯片1102的计算结果亦由接口装置1106传送回外部设备1112。The interface device 1106 is electrically connected to the chip 1102 in the chip packaging structure. The interface device 1106 is used to implement data transmission between the chip 1102 and an external device 1112 (for example, a server or a computer). In this embodiment, the interface device 1106 is a standard PCIe interface, and the data to be processed is transferred from the server to the chip 1102 through the standard PCIe interface to realize data transfer. The calculation result of the chip 1102 is also transmitted back to the external device 1112 by the interface device 1106.
控制器件1108与芯片1102电连接,以便对芯片1102的状态进行监控。具体地,芯片1102与控制器件1108可以通过SPI接口电连接。控制器件1108可以包括单片机(“MCU”,Micro Controller Unit)。The control device 1108 is electrically connected to the chip 1102 to monitor the state of the chip 1102. Specifically, the chip 1102 and the control device 1108 may be electrically connected through an SPI interface. The control device 1108 may include a single-chip microcomputer ("MCU", Micro Controller Unit).
本公开的另一个实施例是一种电子设备或装置,其包括了上述板卡1100。根据不同的应用场景,电子设备或装置可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。Another embodiment of the present disclosure is an electronic device or device, which includes the board 1100 described above. According to different application scenarios, electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include televisions, air conditioners, microwaves, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
虽然此实施例是以人工智能芯片为例说明,但本领域技术人员可以理解的是, 这些方法亦能利用通用处理器来实现。Although this embodiment is described with an artificial intelligence chip as an example, those skilled in the art can understand that these methods can also be implemented by a general-purpose processor.
本公开另一个实施例为一种计算机可读存储介质,其上存储有利用通用地址进行访问的计算机程序代码,当所述计算机程序代码由处理器运行时,执行前述各实施例所述的方法。Another embodiment of the present disclosure is a computer-readable storage medium on which is stored a computer program code that uses a universal address for access, and when the computer program code is run by a processor, the method described in each of the foregoing embodiments is executed .
通过前述各实施例的演示,本发明在编译阶段具体化通用地址的地址空间,确定访问内存,简化硬件复杂度,方便编程,并保证程序的性能。人工智能芯片基于编译后的操作码,执行矩阵的计算,完成输入数据(如图像或语音数据)的计算任务。由于本发明对于通用地址的预处理,将使得计算的过程更为精简高效。Through the demonstration of the foregoing embodiments, the present invention embodies the address space of the general address in the compilation stage, determines the access to the memory, simplifies the hardware complexity, facilitates programming, and ensures the performance of the program. The artificial intelligence chip performs matrix calculations based on the compiled opcodes to complete the calculation tasks of input data (such as image or voice data). Due to the preprocessing of the general address in the present invention, the calculation process will be more streamlined and efficient.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
条款A1、一种基于控制流图推导地址的方法,所述控制流图包括多个基本块,所述多个基本块包括指针,所述指针载有地址。所述方法包括:遍历所述多个基本块,获得所述指针的所有可能的地址空间;判断所述所有可能的地址空间是否推演至单一地址空间;如是,设定所述指针访问所述地址空间。Clause A1. A method for deriving an address based on a control flow graph, the control flow graph including a plurality of basic blocks, the plurality of basic blocks including pointers, the pointers carrying addresses. The method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.
条款A2、根据条款A1中所述的方法,所述方法还包括:如否,设定指针为通用地址访问。Clause A2. According to the method described in Clause A1, the method further includes: if not, setting the pointer to universal address access.
条款A3、根据条款A2所述的方法,还包括:判断所述指针是否涉及函数调用;以及如涉及函数调用,设定指针为通用地址访问。Clause A3. The method according to Clause A2, further comprising: judging whether the pointer involves a function call; and if it involves a function call, setting the pointer to a general address access.
条款A4、根据条款A2或3所述的方法,所述多个基本块访问第一内存及第二内存,所述第一内存的存储空间是由第一地址至第二地址所定义,所述第二内存的存储空间是由第三地址至第四地址所定义,所述方法还包括:判断所述通用地址是否落在所述第一地址与所述第二地址间;如落在所述第一地址与所述第二地址间,则设定第一变量为真;如未落在所述第一地址与所述第二地址间,则设定所述第一变量为假;判断所述第一变量是否为真;以及如所述第一变量为真,则设定所述指针访问所述第一内存。Clause A4. According to the method described in clause A2 or 3, the plurality of basic blocks access the first memory and the second memory, and the storage space of the first memory is defined by the first address to the second address, and The storage space of the second memory is defined by the third address to the fourth address, and the method further includes: determining whether the universal address falls between the first address and the second address; Between the first address and the second address, set the first variable to be true; if it does not fall between the first address and the second address, set the first variable to be false; judge all Whether the first variable is true; and if the first variable is true, set the pointer to access the first memory.
条款A5、根据条款A4所述的方法,其中所述第三地址为所述第二地址加一。Clause A5. The method of clause A4, wherein the third address is the second address plus one.
条款A6、根据条款A5所述的方法,其中如所述第一变量为假,则设定所述指针访问所述第二内存。Clause A6. The method according to clause A5, wherein if the first variable is false, the pointer is set to access the second memory.
条款A7、根据条款A5所述的方法,其中所述判断所述通用地址的步骤包括:判断所述通用地址是否小于所述第三地址。Clause A7. The method according to clause A5, wherein the step of judging the universal address comprises: judging whether the universal address is smaller than the third address.
条款A8、根据条款A4所述的方法,其中所述多个基本块还访问第三内存,所述第三内存的存储空间是由第五地址至第六地址所定义,所述方法还包括:判断所述通用地址是否落在所述第五地址与所述第六地址间;如所述通用地址落在所述第五地址与所述第六地址间,则设定第二变量为真;判断所述第二变量是否为真;以及如所述第二变量为真,则设定所述指针访问所述第三内存。Clause A8. The method according to clause A4, wherein the plurality of basic blocks also access a third memory, and the storage space of the third memory is defined by the fifth address to the sixth address, and the method further includes: Determine whether the universal address falls between the fifth address and the sixth address; if the universal address falls between the fifth address and the sixth address, set the second variable to be true; Determine whether the second variable is true; and if the second variable is true, set the pointer to access the third memory.
条款A9、根据条款A8所述的方法,其中所述第三地址为所述第二地址加一,所述第五地址为所述第四地址加一。Clause A9. The method of clause A8, wherein the third address is the second address plus one, and the fifth address is the fourth address plus one.
条款A10、根据条款A9所述的方法,其中所述判断所述通用地址是否落在所述第五地址与所述第六地址间的步骤包括:判断所述通用地址是否大于所述第四地址。Clause A10. The method according to clause A9, wherein the step of judging whether the universal address falls between the fifth address and the sixth address comprises: judging whether the universal address is greater than the fourth address .
条款A11、根据条款A8所述的方法,还包括:如所述通用地址未落在所述第五地址与所述第六地址间,则设定所述第二变量为假;其中,如所述第一变量及所述第 二变量皆为假,则设定所述指针访问所述第二内存。Clause A11. The method according to clause A8, further comprising: if the universal address does not fall between the fifth address and the sixth address, then setting the second variable to false; If both the first variable and the second variable are false, then the pointer is set to access the second memory.
条款A12、根据条款A8所述的方法,还包括:设定第三变量为真;如判断所述第一变量为真,则设定第三变量为假;如判断所述第二变量为真,则设定第三变量为假;判断所述第三变量是否为真;以及如所述第三变量为真,则设定所述指针访问所述第二内存。Clause A12. The method according to clause A8, further comprising: setting a third variable as true; if it is judged that the first variable is true, then setting the third variable as false; if it is judged that the second variable is true , Then set the third variable to be false; determine whether the third variable is true; and if the third variable is true, set the pointer to access the second memory.
条款A13、根据条款A1所述的方法,其中所述判断步骤包括:获得所述指针在所有基本块中的指针变量;判断所述指针变量是否均对应至所述地址空间。Clause A13. The method according to clause A1, wherein the determining step includes: obtaining pointer variables of the pointer in all basic blocks; determining whether the pointer variables all correspond to the address space.
条款A14、根据条款A1所述的方法,其中所述设定步骤在所述多个基本块汇合后执行。Clause A14. The method according to clause A1, wherein the setting step is performed after the plurality of basic blocks are merged.
条款A15、根据条款A1所述的方法,其中当所述控制流图包括迭代算法时,所述判断步骤在所述指针的地址不变后执行。Clause A15. The method according to clause A1, wherein when the control flow graph includes an iterative algorithm, the judging step is performed after the address of the pointer does not change.
条款A16、根据条款A1所述的方法,其中所述控制流图为跳转控制或循环控制。Clause A16. The method according to clause A1, wherein the control flow graph is jump control or loop control.
条款A17、一种计算机可读存储介质,其上存储有在系统中利用通用地址进行访问的计算机程序代码,当所述计算机程序代码由处理器运行时,执行条款A1-16的任意一项所述的方法。Clause A17. A computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system. When the computer program code is run by a processor, it executes any of the items in Clause A1-16. The method described.
条款A18、一种计算装置,包括处理器核,所述处理器核执行条款A1-16的任意一项所述的方法。Clause A18. A computing device including a processor core that executes the method described in any one of clauses A1-16.
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。The embodiments of the disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the disclosure; at the same time, for Those skilled in the art, based on the ideas of this disclosure, will have changes in the specific implementation and scope of application. In summary, the content of this specification should not be construed as limiting this disclosure.

Claims (18)

  1. 一种基于控制流图推导地址的方法,所述控制流图包括多个基本块,所述多个基本块包括至少一条指令,所述指令包含的指针载有地址,所述方法包括:A method for deriving an address based on a control flow graph, the control flow graph including a plurality of basic blocks, the plurality of basic blocks including at least one instruction, and a pointer contained in the instruction carrying an address, the method including:
    遍历所述多个基本块,获得所述指针的所有可能的地址空间;Traverse the multiple basic blocks to obtain all possible address spaces of the pointer;
    判断所述所有可能的地址空间是否推演至单一地址空间;以及Determine whether all the possible address spaces are deduced to a single address space; and
    如是,设定所述指针访问所述地址空间。If so, set the pointer to access the address space.
  2. 根据权利要求1中所述的方法,所述方法还包括:The method according to claim 1, the method further comprising:
    如否,设定指针为通用地址访问。If not, set the pointer to general address access.
  3. 根据权利要求2所述的方法,还包括:The method according to claim 2, further comprising:
    判断所述指针是否涉及函数调用;以及Determine whether the pointer involves a function call; and
    如涉及函数调用,设定指针为通用地址访问。If it involves function calls, set the pointer to general address access.
  4. 根据权利要求2或3所述的方法,所述多个基本块访问第一内存及第二内存,所述第一内存的存储空间是由第一地址至第二地址所定义,所述第二内存的存储空间是由第三地址至第四地址所定义,所述方法还包括:According to the method of claim 2 or 3, the plurality of basic blocks access a first memory and a second memory, the storage space of the first memory is defined by a first address to a second address, and the second The storage space of the memory is defined by the third address to the fourth address, and the method further includes:
    判断所述通用地址是否落在所述第一地址与所述第二地址间;Determine whether the universal address falls between the first address and the second address;
    如落在所述第一地址与所述第二地址间,则设定第一变量为真;If it falls between the first address and the second address, set the first variable to be true;
    如未落在所述第一地址与所述第二地址间,则设定所述第一变量为假;If it does not fall between the first address and the second address, set the first variable to false;
    判断所述第一变量是否为真;以及Determine whether the first variable is true; and
    如所述第一变量为真,则设定所述指针访问所述第一内存。If the first variable is true, the pointer is set to access the first memory.
  5. 根据权利要求4所述的方法,其中所述第三地址为所述第二地址加一。The method of claim 4, wherein the third address is the second address plus one.
  6. 根据权利要求5所述的方法,其中如所述第一变量为假,则设定所述指针访问所述第二内存。The method according to claim 5, wherein if the first variable is false, the pointer is set to access the second memory.
  7. 根据权利要求5所述的方法,其中所述判断所述通用地址的步骤包括:The method according to claim 5, wherein said step of determining said universal address comprises:
    判断所述通用地址是否小于所述第三地址。It is determined whether the general address is smaller than the third address.
  8. 根据权利要求4所述的方法,其中所述多个基本块还访问第三内存,所述第三内存的存储空间是由第五地址至第六地址所定义,所述方法还包括:The method according to claim 4, wherein the plurality of basic blocks also access a third memory, and the storage space of the third memory is defined by the fifth address to the sixth address, and the method further comprises:
    判断所述通用地址是否落在所述第五地址与所述第六地址间;Judging whether the universal address falls between the fifth address and the sixth address;
    如所述通用地址落在所述第五地址与所述第六地址间,则设定第二变量为真;If the general address falls between the fifth address and the sixth address, then set the second variable to be true;
    判断所述第二变量是否为真;以及Determine whether the second variable is true; and
    如所述第二变量为真,则设定所述指针访问所述第三内存。If the second variable is true, the pointer is set to access the third memory.
  9. 根据权利要求8所述的方法,其中所述第三地址为所述第二地址加一,所述第五地址为所述第四地址加一。8. The method of claim 8, wherein the third address is the second address plus one, and the fifth address is the fourth address plus one.
  10. 根据权利要求9所述的方法,其中所述判断所述通用地址是否落在所述第五地址与所述第六地址间的步骤包括:The method according to claim 9, wherein said step of determining whether said universal address falls between said fifth address and said sixth address comprises:
    判断所述通用地址是否大于所述第四地址。It is determined whether the universal address is greater than the fourth address.
  11. 根据权利要求8所述的方法,还包括:The method according to claim 8, further comprising:
    如所述通用地址未落在所述第五地址与所述第六地址间,则设定所述第二变量为假;其中,如所述第一变量及所述第二变量皆为假,则设定所述指针访问所述第二内存。If the general address does not fall between the fifth address and the sixth address, then the second variable is set to false; where, if the first variable and the second variable are both false, Then, the pointer is set to access the second memory.
  12. 根据权利要求8所述的方法,还包括:The method according to claim 8, further comprising:
    设定第三变量为真;Set the third variable to be true;
    如判断所述第一变量为真,则设定第三变量为假;If it is judged that the first variable is true, then the third variable is set to false;
    如判断所述第二变量为真,则设定第三变量为假;If it is judged that the second variable is true, then the third variable is set to false;
    判断所述第三变量是否为真;以及Determine whether the third variable is true; and
    如所述第三变量为真,则设定所述指针访问所述第二内存。If the third variable is true, the pointer is set to access the second memory.
  13. 根据权利要求1所述的方法,其中所述判断步骤包括:The method according to claim 1, wherein said determining step comprises:
    获得所述指针在所有基本块中的指针变量;Obtain the pointer variables of the pointer in all basic blocks;
    判断所述指针变量是否均对应至所述地址空间。It is determined whether the pointer variables all correspond to the address space.
  14. 根据权利要求1所述的方法,其中所述设定步骤在所述多个基本块汇合后执行。The method according to claim 1, wherein the setting step is performed after the plurality of basic blocks are merged.
  15. 根据权利要求1所述的方法,其中重复执行所述遍历步骤、所述判断步骤及所述设定步骤,直至所述指针的地址不变。The method according to claim 1, wherein the traversing step, the judging step, and the setting step are repeatedly executed until the address of the pointer does not change.
  16. 根据权利要求1所述的方法,其中所述控制流图为跳转控制或循环控制。The method according to claim 1, wherein the control flow graph is jump control or loop control.
  17. 一种计算机可读存储介质,其上存储有在系统中利用通用地址进行访问的计算机程序代码,当所述计算机程序代码由处理器运行时,执行权利要求1-16的任意一项所述的方法。A computer-readable storage medium, on which is stored computer program code for accessing in a system using a universal address, and when the computer program code is run by a processor, the computer program code described in any one of claims 1-16 is executed method.
  18. 一种计算装置,包括处理器核,所述处理器核执行权利要求1-16的任意一项所述的方法。A computing device comprising a processor core, and the processor core executes the method according to any one of claims 1-16.
PCT/CN2021/096379 2020-06-16 2021-05-27 Address deduction method employing control flow graph, device, and readable storage medium WO2021254123A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010550699.4A CN113805938A (en) 2020-06-16 2020-06-16 Method and device for deriving address based on control flow graph and readable storage medium
CN202010550699.4 2020-06-16

Publications (1)

Publication Number Publication Date
WO2021254123A1 true WO2021254123A1 (en) 2021-12-23

Family

ID=78943357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096379 WO2021254123A1 (en) 2020-06-16 2021-05-27 Address deduction method employing control flow graph, device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113805938A (en)
WO (1) WO2021254123A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412823A (en) * 2013-08-07 2013-11-27 格科微电子(上海)有限公司 Chip architecture based on ultra-wide buses and data access method of chip architecture
CN106648818A (en) * 2016-12-16 2017-05-10 华东师范大学 Generation system of object code control flow diagram
CN110389929A (en) * 2018-04-16 2019-10-29 格科微电子(上海)有限公司 System on chip framework based on distributed memory
US20200050562A1 (en) * 2015-10-07 2020-02-13 Rambus Inc. Interface for memory readout from a memory component in the event of fault

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412823A (en) * 2013-08-07 2013-11-27 格科微电子(上海)有限公司 Chip architecture based on ultra-wide buses and data access method of chip architecture
US20200050562A1 (en) * 2015-10-07 2020-02-13 Rambus Inc. Interface for memory readout from a memory component in the event of fault
CN106648818A (en) * 2016-12-16 2017-05-10 华东师范大学 Generation system of object code control flow diagram
CN110389929A (en) * 2018-04-16 2019-10-29 格科微电子(上海)有限公司 System on chip framework based on distributed memory

Also Published As

Publication number Publication date
CN113805938A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN109086877B (en) Apparatus and method for performing convolutional neural network forward operation
US11900113B2 (en) Data flow processing method and related device
US11016773B2 (en) Processor trace extensions to facilitate real-time security monitoring
WO2024093292A1 (en) Automatic operator fusion method for computational graph and related product
CN112465133B (en) Control flow multi-core parallel method, computer device and storage medium
US11507348B2 (en) Method and apparatus for generating chip-based computing function, device, and storage medium
US10684834B2 (en) Method and apparatus for detecting inter-instruction data dependency
WO2021254123A1 (en) Address deduction method employing control flow graph, device, and readable storage medium
US8117603B2 (en) Operation synthesis system
WO2023030507A1 (en) Compilation optimization method and apparatus, computer device and storage medium
CN112465116B (en) Compiling method, operation method, electronic device, and storage medium
CN116149732A (en) Hardware automatic execution method, system and product of data flow task
WO2022111703A1 (en) Method, device and system for acquiring hardware performance data
CN111461314B (en) Method and device for performing artificial neural network calculation based on constant data packet and computer readable storage medium
CN115840894A (en) Method for processing multidimensional tensor data and related product thereof
CN115329923A (en) Compiling method for neural network model and related product
WO2021239056A1 (en) Method for computing data dependence relationship in program, and computer readable storage medium
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111461326B (en) Instruction addressing method based on equipment memory and computer readable storage medium
CN114185667A (en) Data processing method and device and related product
CN117075902A (en) Tensor-based compiling method, tensor-based compiling device and computer-readable storage medium for tensor-based compiling device
CN116225531A (en) Apparatus and method for generating intermediate representation instructions for expression vector operations
CN112558978B (en) Accelerated programming and compiling method for supporting heterogeneous many-core full-chip view angle
CN117075903A (en) Tensor-based compiling method, tensor-based compiling device and computer-readable storage medium for tensor-based compiling device
CN117667211A (en) Instruction synchronous control method, synchronous controller, processor, chip and board card

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21827144

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21827144

Country of ref document: EP

Kind code of ref document: A1