WO2021254123A1

WO2021254123A1 - Address deduction method employing control flow graph, device, and readable storage medium

Info

Publication number: WO2021254123A1
Application number: PCT/CN2021/096379
Authority: WO
Inventors: 石雯
Original assignee: 中科寒武纪科技股份有限公司
Priority date: 2020-06-16
Filing date: 2021-05-27
Publication date: 2021-12-23
Also published as: CN113805938A

Abstract

Provided are an address deduction method employing a control flow graph, a device, and a readable storage medium. A computing device is included in an integrated circuit device. The integrated circuit device comprises a universal interconnect interface and other processing devices. The computing device interacts with said other processing devices to jointly complete a computing operation specified by a user. The integrated circuit device may further comprise a storage device. The storage device is respectively connected to the computing device and said other processing devices for storing data of the computing device and said other processing devices.

Description

Method, device and readable storage medium for deriving address based on control flow graph

Cross-references to related applications

This application claims the priority of a Chinese patent application filed on June 16, 2020, with application number 2020105506994, titled "Method, device and readable storage medium for deriving an address based on a control flow graph", the full text of which is hereby Introduced as a reference.

Technical field

The present disclosure generally relates to the field of computers. More specifically, the present disclosure relates to a method, an apparatus, and a readable storage medium for deriving an address based on a control flow graph.

Background technique

Traditional general-purpose processors have an automatic storage management mechanism. For accessing data, simply use load or store instructions to load the data from the register to the on-chip register for processing, and the result after the processing is completed It is then stored back into the memory through the register to speed up the execution speed of data processing on the off-chip memory.

However, for special chips such as artificial intelligence, due to factors such as performance, area, power consumption, etc., automatic storage management mechanisms are not used, but the on-chip space is explicitly managed by instructions. When accessing data, additional The domain or parameter sets the address space operated by the memory access instruction, thereby increasing the programming difficulty. Therefore, a more effective address derivation scheme is urgently needed.

Summary of the invention

In order to at least partially solve the technical problems mentioned in the background art, the solutions of the present disclosure provide a method, a device and a readable storage medium for deriving a pointer address based on a control flow graph.

In one aspect, the present disclosure discloses a method for deriving an address based on a control flow graph. The control flow graph includes a plurality of basic blocks, the plurality of basic blocks include at least one instruction, and a pointer contained in the instruction carries an address. . The method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.

In another aspect, the present disclosure discloses a computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system, and when the computer program code is run by a processor, the foregoing method is executed.

In another aspect, the present disclosure discloses a computing device including a processor core, and the processor core executes the aforementioned method.

The solution technology of the present disclosure is aimed at a dedicated processor such as artificial intelligence, simplifies its hardware complexity, facilitates programming, and further ensures the performance of the program with the help of pointer derivation.

Description of the drawings

By reading the following detailed description with reference to the accompanying drawings, the above and other objects, features, and advantages of the exemplary embodiments of the present disclosure will become easier to understand. In the accompanying drawings, several embodiments of the present disclosure are shown in an exemplary but not restrictive manner, and the same or corresponding reference numerals indicate the same or corresponding parts of which:

FIG. 1 is a flowchart showing an embodiment of the present disclosure;

FIG. 2 is a control flow diagram showing an example of an embodiment of the present disclosure;

FIG. 3 is a control flow diagram showing another example of an embodiment of the present disclosure;

FIG. 4 is a flowchart showing another embodiment of the present disclosure;

FIG. 5 is a control flow diagram showing another example of an embodiment of the present disclosure;

FIG. 6 is a flowchart showing another embodiment of the present disclosure;

FIG. 7 is a flowchart showing another embodiment of the present disclosure;

FIG. 8 is a flowchart showing another embodiment of the present disclosure;

FIG. 9 is a schematic diagram showing a computing device according to another embodiment of the present disclosure;

FIG. 10 is a structural diagram showing an integrated circuit device according to another embodiment of the present disclosure; and

FIG. 11 is a structural diagram showing a board card according to another embodiment of the present disclosure.

detailed description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.

It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should also be further understood that the term "and/or" used in the specification and claims of the present disclosure refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.

As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context.

The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

In order to enable the computer to implement a specific operation, the programmer must input the ideas, methods, and means of the problem to be solved into the computer in a form that the computer can understand, so that the computer can execute the instructions in sequence. This channel of communication between people and computers is called programming. Programming languages are divided into three categories: machine language, high-level language and assembly language.

Machine language is an operation code that the machine can directly recognize. Each operation code has a corresponding circuit inside the computer to complete it. Generally, a series of 0 and 1 instructions are used to directly control the potential of each component of the computer to complete the expectation. Task. Programs written in machine language, because each instruction corresponds to a specific basic action of the computer, the program occupies less memory and has high execution efficiency. However, the disadvantage is that the programming workload is large, error-prone, difficult to interpret, and depends on specific computer structures. Therefore, the program's versatility and portability are not good.

The high-level language has nothing to do with the hardware structure and instruction system of the computer. It is based on human logic and grammar, so it has stronger expressive ability, and it is also convenient to express data operations and program control structures, and can intuitively describe various algorithms. And it is easy to learn and master. Currently popular programming languages such as java, c++, python, etc. are all high-level languages. Since high-level languages are programmed from a human perspective and are relatively indirect to computers, the opcodes generated after compilation are often longer than machine language program codes, and the execution speed is also slower. Not only that, high-level languages "can't see" the computer's hardware structure, and therefore cannot directly control the system software that accesses hardware resources. For this reason, some high-level languages will use assembly language as an external procedure or function of the high-level language.

Assembly language is between machine language and high-level language. Compared with machine language, it is easier for programmers to understand and program. Compared with high-level language, it has more direct machine relevance and achieves the characteristics of high speed and efficiency. In today's highly developed high-level language, assembly language is usually used at the bottom level for program optimization or hardware operation.

The code written in high-level language and assembly language needs to be converted into machine code by a compiler to drive the computer.

When the artificial intelligence chip is running, it needs to access a large amount of memory and move data between the memories to perform calculation tasks. For example, after the image or voice information is converted into a matrix, the data of the matrix will be copied from the off-chip memory to the on-chip memory for calculation.

This disclosure is aimed at application-specific integrated circuits (ASICs), especially artificial intelligence chips. In the code phase, no additional fields or parameters are required to define the access address space. The memory address is derived at compile time and the general address is realized. To achieve the purpose of streamlining computing resources and shortening computing time.

An embodiment of the present disclosure is a method for deriving an address based on a control flow graph, and in more detail, a method for deriving an address based on a control flow graph is implemented by a fixed point algorithm. The Control Flow Graph (CFG) is an abstract data structure used in the compiler, which represents all the paths that a program may execute, and reflects the possible flow of all basic blocks in the process in the form of a flowchart.

The control flow graph is composed of nodes and the relationships between nodes. A node is also called a basic block (basic block, BB), which is a sequence of statements executed in the maximum order in the program. Each basic block has only one entry and exit. During execution, it enters from its entry and exits from its exit. The characteristic of the basic block is that as long as the first instruction is executed, all the instructions in the basic block will be executed in order.

Each basic block contains at least one instruction, and the instruction in the basic block may use a pointer to point to a specific scratchpad or memory. A pointer is a variable used to store an address in a specific address space. Through the pointer, the programmer can load data into the space of the specific address pointed to by the pointer, or fetch the data in the specific address pointed to by the pointer.

Control flow graphs often have conditions such as predicates, jumps, and loops. A predicate refers to a delegate, which includes a method function used to determine whether the condition is met. Jump refers to a judgment instruction that causes the process to branch. When a condition is met, one instruction is executed, and another instruction is executed when the condition is not met. The loop is to continuously execute the same instruction under certain conditions, and will not stop until the condition is met.

When there are multiple memories in the system, the appearance of the control flow graph includes but is not limited to the aforementioned conditional instructions. Unless the specific situation and conditions are compared, the memory access will be uncertain. When it is not certain which memory the pointer corresponds to, the pointer is generally set to be accessed by a general address. By deriving the pointer to the address space, the compiler of this embodiment clarifies the address space accessed by the pointer originally set as a general address. Once it is determined at compile time, unnecessary execution steps can be reduced to optimize the program. Performance.

Fig. 1 is a flowchart showing this embodiment.

In step 101, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. This embodiment traverses all the basic blocks in the control flow diagram, and obtains the address space pointed to by the pointer used by the instruction in each basic block. For the same pointer, get all its possible address spaces.

When traversing, the order of the data stream can be reversed. Take the reverse order as an example, which is obtained by traversing in a later order, and then inverting the result. The reverse order can converge earlier. This embodiment does not limit the order of traversal, but it is better to use reverse post-order traversal.

In step 102, the compiler determines whether all these possible address spaces are deduced to a single address space. The compiler obtains the pointer variable of the pointer in the basic block. This step is to determine whether these pointers can be deduced to a single address space after undergoing the flow of the control flow graph.

Optionally, the compiler can sequentially obtain all possible address spaces of pointers used by instructions in all predecessor basic blocks of each basic block according to the control flow graph, and determine all possible addresses of pointers used by instructions in these predecessor basic blocks Whether the space can be deduced to a single address space. Among them, in the control flow diagram, when a basic block is executed before another basic block, the basic block can be regarded as a predecessor basic block of another basic block. For example, the third basic block and the fourth basic block in FIG. 2 are both predecessor basic blocks of the fifth basic block.

It is important to note that although some variables are different on the surface, these variables may belong to aliases, that is, different variables but all access the same address space. Before making judgments in this step, the compiler can perform alias analysis to find out The address variable of the alias is used to determine whether the address of the variable pointed to by the pointer is unique. If the compiler does not have the function of alias analysis, this embodiment can simulate the process of alias analysis, and judge the variable information pointed to by the pointer recorded on the Intermediate Representation (IR) of each instruction.

If it is deduced to a single address space, step 103 is executed, and the compiler sets this pointer to access the address space.

However, if it is not deduced to a single address space, step 104 is executed, and the compiler sets the pointer to general address access. Not deduced to a single address space means that this pointer may have multiple possibilities in the control flow graph, and it will access different address spaces under different conditions, so it remains as a general address access.

Repeat the above steps 101 to 104 until the address space accessed by each pointer remains unchanged, and record the address space eventually pointed to by each pointer. After performing the foregoing steps, the compiler can generate a memory access instruction according to the address space settings that the pointer ultimately points to, so as to implement various data access operations such as images, voices, or texts according to the memory access instructions.

FIG. 2 is an example of a control flow diagram to illustrate the flow chart of this embodiment. This control flow diagram 200 includes five basic blocks, namely a first basic block 201, a second basic block 202, a third basic block 203, a fourth basic block 204, and a fifth basic block 205. The first basic block 201 is the entrance of the control flow diagram 200, the exit of the first basic block 201 is connected to the entrance of the second basic block 202, and the exit of the second basic block 202 is connected to the third basic block 203 and the fourth basic block at the same time. The entry of block 204, which means that there is a judgment formula at the exit of the second basic block 202, for example, jump, jump to the third basic block 203 if a specific condition is met, and jump to the fourth basic block 204 if the specific condition is not met, The exit of the third basic block 203 and the exit of the fourth basic block 204 are both connected to the entrance of the fifth basic block 205. The exit of the fifth basic block 205 is the exit of the entire control flow diagram 200, and is connected back to the second basic block. Entrance to block 202.

In step 101, the compiler traverses all the basic blocks of the control flow graph 200 to obtain all possible address spaces of the pointers in each basic block. For the convenience of description, only one pointer p in the control flow graph 200 is used as an example here. After traversing all the basic blocks in the control flow graph 200, for the pointer p, all possible address spaces to obtain it are as follows: In the first basic block 201, the pointer p does not point to it explicitly, so it is a general address access; In the basic block 202 and the third basic block 203, a is assigned to the pointer p; in the fourth basic block 204, b is assigned to the pointer p; in the fifth basic block 205, the variable c is pointed to by the self-pointer p The value retrieved from the address. In summary, the pointer p may be variable a or b, which directly affects the value of variable c.

In step 102, the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph. In one case, if the address spaces corresponding to the variable a and the variable b are the same, for example, both are in the on-chip memory, step 103 is executed, and the compiler sets the pointer p to access a specific address space, that is, the on-chip memory. At the exit of the fifth basic block 205, the pointer p points to the on-chip memory. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202. At this time, the pointer p is updated to access the on-chip memory, and then continues according to The control flow graph 200 executes the process of FIG. 1 to iterate until the address space pointed to by the pointer p no longer changes.

When the address space pointed to by the pointer p no longer changes, the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement access to various data such as images, voice, or text according to the memory access instruction operate.

In another case, if the address spaces corresponding to variable a and variable b are different, for example, variable a is stored in on-chip memory, and variable b is stored in off-chip memory, and does not correspond to the same address space, then step 104 is executed. The compiler sets the pointer p for general address access. At the exit of the fifth basic block 205, that is, after the multiple basic blocks are merged, the pointer p is updated to a universal address access. This setting follows the loop of the control flow graph 200 and returns to the entry of the second basic block 202 , Continue to iterate until the address space accessed by the pointer p no longer changes.

FIG. 3 is another example of a control flow diagram, and is also used to illustrate a flowchart of this embodiment. The control flow diagram 300 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow diagram 200, and the difference lies in the instructions related to the pointer p in each basic block.

In step 101, the compiler traverses all the basic blocks of the control flow graph 300, and obtains all possible address spaces of the pointers in each basic block. For the pointer p, after traversing all the basic blocks in the control flow graph 300, all possible points to it are obtained as follows: in the first basic block 301, the variable a is assigned to the pointer p; in the second basic block 302, the variable c It is the value fetched from the address pointed to by the pointer p, and the address is the value of the variable a; in the fourth basic block 304, the pointer p points to an additional offset b.

In step 102, the compiler can sequentially determine whether all possible address spaces of pointers in all predecessor basic blocks of each basic block are deduced to a single address space according to the control flow graph. In this example, the pointer p points to the address space where the variable a is stored, for example, they are all in the on-chip memory. Although the specific address has an offset b in the fourth basic block 304, it is still located in the same address space, so All possible addresses of the pointer in the predecessor basic block of the fifth basic block are deduced to a single address space, that is, the on-chip memory, then step 103 is executed, and the compiler sets the pointer p to access the on-chip memory. At the exit of the fifth basic block 305, the pointer p is updated to point to the on-chip memory. This setting follows the loop of the control flow graph 300 and returns to the entry of the second basic block 302 to iterate until the address space accessed by the pointer p is no longer Until the change occurs. In this example, the pointer p will no longer change, and the on-chip memory can be accessed.

The conditional instructions of the control flow graph also include function calls. Function call is to call a subroutine. When a function call is encountered, it will jump to the subroutine for execution. After the subroutine is executed, it will return to the main program to execute the next instruction. Another embodiment of the present disclosure is an address space derivation method suitable for function calls, and the flowchart is shown in FIG. 4.

In step 401, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. This embodiment traverses all the basic blocks in the control flow graph, and obtains all its possible address spaces for the same pointer.

In step 402, the compiler determines whether the pointer involves a function call. If a function call is involved, step 403 is executed, and the compiler judges whether the function is read-only. If it is not a read-only function, the result of the function call cannot be confirmed during the compiling stage, so step 404 is executed, and the compiler sets the pointer to general address access.

In step 403, if the function is judged to be a read-only function, since the read-only function does not change the address space accessed, or in step 402, it is judged that no function call is involved, then step 405 is executed, and the compiler judges all these possible Whether the address space of is deduced to a single address space. If it is not deduced to a single address space, step 404 is executed, and the compiler sets the pointer to general address access. If it is deduced to a single address space, step 406 is executed, and the compiler sets this pointer to access the address space.

In another embodiment, when step 402 is executed, when the compiler determines that the pointer involves a function call, step 403 may not be executed, and step 404 may be executed directly, and the compiler sets the pointer to general address access.

The compiler can repeat the above steps 401 to 406 until the address space pointed to by the pointer remains unchanged, and then the compiler can generate a memory access instruction according to the address space pointed to by the pointer, so as to implement image, voice, or video access according to the memory access instruction. Access operations of various data such as text.

FIG. 5 is another example of a control flow diagram to illustrate the flow chart of this embodiment. This control flow graph 500 also includes five basic blocks, and the connection relationship of these basic blocks is the same as that of the control flow graph 300, with the only difference that the fifth basic block 305 becomes a function call 505.

In step 401, the compiler traverses multiple basic blocks to obtain all possible address spaces of the pointer. In the first basic block 501, the variable a is assigned to the pointer p. Assuming that the variable a is stored in the memory of a specific chip, the pointer p should access the memory of the specific chip. The control flow through the second basic block 502, the third basic block 503, and the fourth basic block 504 does not change the address space accessed by the pointer p. Therefore, all possible address spaces of the pointer p are the memory of the specific chip.

In step 402, the compiler determines whether the pointer involves a function call. Since this control flow graph 500 involves a function call 505, step 403 is executed, and the compiler judges whether the function is read-only. Assuming that the function call 505 is not a read-only function, step 404 is executed, and the compiler sets the pointer p as a general address access. In other words, at the exit of the function call 505, the pointer p is updated to the general address access. This setting follows the loop of the control flow graph 500 and returns to the entry of the second basic block 502 to iterate until the address space accessed by the pointer p is no longer Changes. In this example, the pointer p will be set to general address access at compile time.

In the foregoing embodiment, in the compilation stage, the address space derivation is used to clarify the pointer that can be deduced to access a single address space, which reduces the program running time and optimizes the performance of the program.

When the pointer is set to a general address access, the present disclosure also provides a method that does not need to set additional fields or parameters. It only needs to explicitly declare the address space where the variable is located when the variable is defined. The access operation is determined by the general address. Access mechanism to complete the method. In more detail, because artificial intelligence chips do not have a memory management mechanism, they cannot directly obtain specific memory information when encountering general address access. Another embodiment of the present disclosure is a general address access method for declaring the address space where the variable is located, which adopts a way of simulating hardware by software to simplify the complexity of hardware design.

The application scenario of this embodiment is an artificial intelligence chip that includes a first memory and a second memory. The storage space of the first memory is defined by the first address to the second address, and the storage space of the second memory is defined by the first address to the second address. Defined by the third address to the fourth address, where the first memory can be an off-chip memory, and the second memory can be an on-chip memory. Furthermore, the addresses of off-chip memory and on-chip memory are arranged consecutively. For example, the first memory has a total of 128 storage spaces, which are respectively pointed to by 128 addresses between the first address addr0 and the second address addr127. The second memory also has 128 storage spaces, followed by 128 addresses between the third address addr128 and the fourth address addr255, that is to say, although the first memory and the second memory are not in the same place, the third address addr128 is the second address addr127 plus one. In this embodiment, the pointer in the basic block has been set as a general address access through the process shown in FIG. 1 or FIG. 4. The flowchart of this embodiment is shown in FIG. 6.

In step 601, the compiler determines whether the universal address falls between the first address and the second address. Since the addresses of the first memory and the second memory are arranged consecutively, as long as it is judged whether the general address is smaller than the third address, it can be known whether the general address falls between the first address and the second address. If it falls between the first address and the second address, step 602 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 603 is executed, and the compiler sets the first variable to false.

Then in step 604, the compiler determines whether the first variable is true. If the first variable is true, it means that the address of the pointer is in the first memory, so step 605 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, it means that the address of the pointer is in the second memory, so step 606 is executed, and the compiler sets the pointer to access the second memory.

After performing the foregoing steps, the compiler can generate a memory access instruction according to the setting, so as to implement various data access operations such as image, voice, or text according to the memory access instruction.

In this embodiment, the variable value of the pointer is calculated with the address range of the memory to obtain information about which memory the address is located in, and then the address space that the pointer should access is determined in the compilation stage.

In the execution of the aforementioned process, you can use predicate to write code to achieve, one of the operable codes is as follows:

setp.lt s%addr 0xXXXXX (1)

@s ld.offchip (2)

! @sld.onchip (3)

Among them, setp.lt is the judgment instruction for the less than operation, s is the predicate register (that is, the aforementioned first variable), %addr is the value of the pointer p, which is the address, 0xXXXXX represents the value 128, @s is the judgment formula, and judgment s Is it true? @s is also a judgment formula to determine whether s is false, ld.offchip is loaded from off-chip memory, and ld.onchip is loaded from on-chip memory.

The command (1) represents: to determine whether the value of the pointer p is less than 128, if yes, set s to 1, which is true, if not, set s to 0, which is false. The instruction (2) means: if s is true, load data from off-chip memory. The instruction (3) represents: if s is false, load data from the on-chip memory. Regardless of loading data from off-chip memory or on-chip memory, its address is the value %addr of the pointer p.

Although this example is described with a load instruction, the present invention is not limited to the type of instruction, and any instruction that needs to access the memory is applicable to the above process.

This embodiment does not need to add additional fields or parameters. Only by judging whether the value in the predicate register is true or false, the information on which memory the address pointed to by the pointer p is located can be obtained at the compilation stage, and the general address information can be established Access mechanism.

When the system memory is more than two, the method of the present invention can also be used to find the corresponding address space. Another embodiment of the present invention is a method for specifying general address access in three memories. In this embodiment, multiple basic blocks in a control flow graph can access the first memory, the second memory, and the third memory. , The storage space of the first memory is defined by the first address to the second address, the storage space of the second memory is defined by the third address to the fourth address, and the storage space of the third memory is defined by the fifth address to the first address. Defined by six addresses, where the first memory and the second memory can be different off-chip memories, and the third memory is on-chip memory. Similarly, the addresses of these memories are arranged sequentially. For example, the first memory has 128 storage spaces, which are represented by 128 addresses between the first address addr0 and the second address addr127, and the second memory also has 128 storage spaces. Space, the connection is represented by 128 addresses between the third address addr128 and the fourth address addr255, the third memory also has 128 storage spaces, and the connection is represented by 128 addresses between the fifth address addr256 and the sixth address addr383, In other words, the third address addr128 is the second address addr127 plus one, and the fifth address addr256 is the fourth address addr255 plus one. The flowchart of this embodiment is shown in FIG. 7.

In step 701, the compiler determines whether the universal address falls between the first address and the second address. That is, it is judged whether the general address is smaller than the third address. If it falls between the first address and the second address, step 702 is executed, and the compiler sets the first variable to be true. If it does not fall between the first address and the second address, step 703 is executed, and the compiler sets the first variable to false.

Next, step 704 is executed, and the compiler determines whether the universal address falls between the fifth address and the sixth address. That is, it is judged whether the general address is greater than the fourth address. If it falls between the fifth address and the sixth address, step 705 is executed, and the compiler sets the second variable as true. If it does not fall between the fifth address and the sixth address, step 706 is executed, and the compiler sets the second variable to false.

Then in step 707, the compiler determines whether the first variable is true. If the first variable is true, it indicates that the address of the pointer is located in the first memory, so step 708 is executed, and the compiler sets the pointer to access the first memory. If the first variable is false, step 709 is executed, and the compiler judges whether the second variable is true. If the second variable is true, it means that the address of the pointer is located in the third memory, so step 710 is executed, and the compiler sets the pointer to access the third memory. If the second variable is false, then in step 711, the compiler judges whether the first variable and the second variable are both false, if yes, it means that the address is not in the first memory and the third memory, then step 712 is executed, the compiler The pointer is set to access the second memory.

Logically speaking, since the address must be in the first memory, the second memory, or the third memory, at least one of the judgment steps of step 707, step 709, and step 711 will be judged as yes. In other words, it should not be judged as yes in step 711. There is a situation where the first variable or the second variable is not false. If it does occur, the process will return to step 707, and the compiler will re-judge whether the first variable and the second variable are true or false.

It can be seen from the above process that in this embodiment, the variable value of the pointer is calculated with the address range, and then the memory where the address is located can be known, and then the value can be accessed according to the address.

The process of Figure 7 can be implemented by the following code:

setp.lt s%addr 0xXXXXX (4)

setp.gt t%addr 0xYYYYY (5)

@sld.chip1 (6)

@tld.chip (7)

! @s&! @tld.chip2 (8)

The instruction (4) represents: judge whether the value %addr of the pointer p is less than 128, if yes, set the value of the s predicate register (ie, the first variable) to 1, if not, set it to 0. The command (5) represents: judge whether the value %addr of the pointer p is less than 256 (0xYYYYY), if yes, set the value of the t predicate register (that is, the second variable) to 1, if not, set it to 0 . The instruction (6) represents: if s is true (ie, s=1), load data from the first memory. The instruction (7) represents: if t is true (that is, t=1), load data from the third memory. The instruction (8) represents: if s is false (ie, s=0) and t is false (ie, t=0), load data from the second memory.

The instruction (8) involves the operation of the predicate, but not all compilers can support the operation of the predicate. If the compiler cannot support it, the predicate assignment can be used to achieve the equivalent effect. Another embodiment of the present invention is to use predicate assignment to realize a method for judging the specific address space of the general address in three memories. FIG. 8 is a flowchart showing this embodiment, in which steps 801 to 810 correspond to steps 701 to 710 in FIG. 7 respectively, and will not be repeated.

When it is determined in step 807 that the first variable is true, step 811 is executed after step 808, and the compiler sets the third variable to false. Similarly, when the second variable is judged to be true in step 809, step 811 is executed after step 810, and the compiler sets the third variable to false. When it is determined in step 809 that the second variable is not true, step 812 is executed, and the compiler sets the third variable to be true. After step 811 and step 812, step 813 is executed, and the compiler determines whether the third variable is true. If it is true, it means that both the first and second variables are false, and step 814 is executed, and the compiler sets the pointer to access the second memory. If it is false, it means that the process has passed step 808 or step 810, and the pointer has been set to access the first memory or the second memory, so the process ends in step 815.

After performing the foregoing steps, after the compilation is completed, the image or voice data is calculated according to the settings.

When the instruction (8) is changed to the predicate assignment method, it can be completed with the following four instructions:

u = 1 (9)

@! s u = 0 (10)

@! t u = 0 (11)

@u ld.chip1 (12)

The command (9) represents: set the third variable u to be true; the command (10) represents: if s is not false, then the third variable is set to false; the command (11) represents: if t If it is not false, set the third variable to false; the instruction (12) means: if the third variable is true, load the second memory.

The three memories in this embodiment are only examples, and those skilled in the art can apply the present invention to scenarios with more than three memories without creative work, and these scenarios are all within the scope disclosed by the present invention.

Another embodiment of the present invention is an artificial intelligence chip computing device. FIG. 9 shows a schematic diagram of the internal structure of such a computing device 900. The computing device 900 has a total of sixteen processor cores (processor core 0 to processor core 15) for performing matrix calculation tasks, and every four processor cores form a processing unit group, that is, a cluster. In more detail, processor core 0 to processor core 3 form a first cluster 902, processor core 4 to processor core 7 form a second cluster 904, and processor core 8 to processor core 11 form a third cluster 906. The processor core 12 to the processor core 15 form a fourth cluster 908. The computing device 130 basically uses a cluster as a unit to perform computing tasks.

The computing device 900 also includes a storage unit core 910 and a shared storage unit 912. The storage unit core 910 is mainly used to control data exchange and serves as a communication channel between the computing device 900 and the off-chip memory. The shared storage unit 912 is an on-chip memory for temporarily storing the calculated intermediate values of the

clusters

902, 904, 906, and 908.

The processor core 0 to the processor core 15 are used to execute the methods of the foregoing embodiments, specifically including but not limited to the processes of FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8.

FIG. 10 is a structural diagram showing an integrated circuit device 1000 according to an embodiment of the present disclosure. As shown in FIG. 10, the integrated circuit device 1000 includes a computing device 900, a universal interconnect interface 1004, and other processing devices 1006.

The universal interconnect interface 1004 can be used to transmit data and control commands between the computing device 900 and other processing devices 1006. For example, the computing device 900 may obtain required input data from other processing devices 1006 via the universal interconnect interface 1004, and write the required input data to the shared storage unit 912 on the computing device 900 chip. Further, the computing device 900 can obtain control instructions from other processing devices 1006 via the universal interconnect interface 1004, and write them into the on-chip control buffer of the computing device 900.

The other processing device 1006 may be one or more types of processors in general and/or special-purpose processors such as central processing unit, graphics processor, artificial intelligence processor, etc., the number of which is not limited but determined according to actual needs. . The other processing device 1006 serves as an interface between the computing device 900 and external data and control, performs basic control including but not limited to data transfer, and completes the starting and stopping of the computing device 900. The other processing device 1006 can also cooperate with the computing device 900 to complete computing tasks.

The integrated circuit device 1000 further includes an off-chip memory 1008, which can be connected to the computing device 900 and other processing devices 1006, respectively. The off-chip memory 1008 is used to store data of the computing device 900 and other processing devices 1006, and is especially suitable for data that cannot be fully stored in the internal storage of the computing device 900 or other processing devices 1006 for the data that needs to be calculated.

According to different application scenarios, the integrated circuit device 1000 can be used as a system on chip (SOC) for mobile phones, robots, drones, video capture and other equipment, thereby effectively reducing the core area of the control part, increasing processing speed and reducing overall power consumption . In this case, the universal interconnect interface 1004 of the integrated circuit device 1000 is connected to certain components of the device. Some components here can be, for example, a camera, a monitor, a mouse, a keyboard, a network card or a wifi interface.

The present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1000. The present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.

Another embodiment of the present disclosure is a board card, which includes the chip packaging structure described above. Referring to FIG. 11, in addition to a plurality of the aforementioned chips 1102, the board 1100 may also include other supporting components. The supporting components include a storage device 1104, an interface device 1106, and a control device 1108.

The storage device 1104 is connected to the chip 1102 in the chip packaging structure through a bus 1114 for storing data. The storage device 1104 may include multiple groups of storage units 1110. Each group of storage units 1110 may be the aforementioned off-chip memory.

The interface device 1106 is electrically connected to the chip 1102 in the chip packaging structure. The interface device 1106 is used to implement data transmission between the chip 1102 and an external device 1112 (for example, a server or a computer). In this embodiment, the interface device 1106 is a standard PCIe interface, and the data to be processed is transferred from the server to the chip 1102 through the standard PCIe interface to realize data transfer. The calculation result of the chip 1102 is also transmitted back to the external device 1112 by the interface device 1106.

The control device 1108 is electrically connected to the chip 1102 to monitor the state of the chip 1102. Specifically, the chip 1102 and the control device 1108 may be electrically connected through an SPI interface. The control device 1108 may include a single-chip microcomputer ("MCU", Micro Controller Unit).

Another embodiment of the present disclosure is an electronic device or device, which includes the board 1100 described above. According to different application scenarios, electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include televisions, air conditioners, microwaves, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

Although this embodiment is described with an artificial intelligence chip as an example, those skilled in the art can understand that these methods can also be implemented by a general-purpose processor.

Another embodiment of the present disclosure is a computer-readable storage medium on which is stored a computer program code that uses a universal address for access, and when the computer program code is run by a processor, the method described in each of the foregoing embodiments is executed .

Through the demonstration of the foregoing embodiments, the present invention embodies the address space of the general address in the compilation stage, determines the access to the memory, simplifies the hardware complexity, facilitates programming, and ensures the performance of the program. The artificial intelligence chip performs matrix calculations based on the compiled opcodes to complete the calculation tasks of input data (such as image or voice data). Due to the preprocessing of the general address in the present invention, the calculation process will be more streamlined and efficient.

The foregoing can be better understood according to the following clauses:

Clause A1. A method for deriving an address based on a control flow graph, the control flow graph including a plurality of basic blocks, the plurality of basic blocks including pointers, the pointers carrying addresses. The method includes: traversing the multiple basic blocks to obtain all possible address spaces of the pointer; judging whether all the possible address spaces are deduced to a single address space; if so, setting the pointer to access the address space.

Clause A2. According to the method described in Clause A1, the method further includes: if not, setting the pointer to universal address access.

Clause A3. The method according to Clause A2, further comprising: judging whether the pointer involves a function call; and if it involves a function call, setting the pointer to a general address access.

Clause A4. According to the method described in clause A2 or 3, the plurality of basic blocks access the first memory and the second memory, and the storage space of the first memory is defined by the first address to the second address, and The storage space of the second memory is defined by the third address to the fourth address, and the method further includes: determining whether the universal address falls between the first address and the second address; Between the first address and the second address, set the first variable to be true; if it does not fall between the first address and the second address, set the first variable to be false; judge all Whether the first variable is true; and if the first variable is true, set the pointer to access the first memory.

Clause A5. The method of clause A4, wherein the third address is the second address plus one.

Clause A6. The method according to clause A5, wherein if the first variable is false, the pointer is set to access the second memory.

Clause A7. The method according to clause A5, wherein the step of judging the universal address comprises: judging whether the universal address is smaller than the third address.

Clause A8. The method according to clause A4, wherein the plurality of basic blocks also access a third memory, and the storage space of the third memory is defined by the fifth address to the sixth address, and the method further includes: Determine whether the universal address falls between the fifth address and the sixth address; if the universal address falls between the fifth address and the sixth address, set the second variable to be true; Determine whether the second variable is true; and if the second variable is true, set the pointer to access the third memory.

Clause A9. The method of clause A8, wherein the third address is the second address plus one, and the fifth address is the fourth address plus one.

Clause A10. The method according to clause A9, wherein the step of judging whether the universal address falls between the fifth address and the sixth address comprises: judging whether the universal address is greater than the fourth address .

Clause A11. The method according to clause A8, further comprising: if the universal address does not fall between the fifth address and the sixth address, then setting the second variable to false; If both the first variable and the second variable are false, then the pointer is set to access the second memory.

Clause A12. The method according to clause A8, further comprising: setting a third variable as true; if it is judged that the first variable is true, then setting the third variable as false; if it is judged that the second variable is true , Then set the third variable to be false; determine whether the third variable is true; and if the third variable is true, set the pointer to access the second memory.

Clause A13. The method according to clause A1, wherein the determining step includes: obtaining pointer variables of the pointer in all basic blocks; determining whether the pointer variables all correspond to the address space.

Clause A14. The method according to clause A1, wherein the setting step is performed after the plurality of basic blocks are merged.

Clause A15. The method according to clause A1, wherein when the control flow graph includes an iterative algorithm, the judging step is performed after the address of the pointer does not change.

Clause A16. The method according to clause A1, wherein the control flow graph is jump control or loop control.

Clause A17. A computer-readable storage medium on which is stored computer program code that uses a universal address to access in the system. When the computer program code is run by a processor, it executes any of the items in Clause A1-16. The method described.

Clause A18. A computing device including a processor core that executes the method described in any one of clauses A1-16.

The embodiments of the disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the disclosure; at the same time, for Those skilled in the art, based on the ideas of this disclosure, will have changes in the specific implementation and scope of application. In summary, the content of this specification should not be construed as limiting this disclosure.

Claims

A method for deriving an address based on a control flow graph, the control flow graph including a plurality of basic blocks, the plurality of basic blocks including at least one instruction, and a pointer contained in the instruction carrying an address, the method including:

Traverse the multiple basic blocks to obtain all possible address spaces of the pointer;

Determine whether all the possible address spaces are deduced to a single address space; and

If so, set the pointer to access the address space.
The method according to claim 1, the method further comprising:

If not, set the pointer to general address access.
The method according to claim 2, further comprising:

Determine whether the pointer involves a function call; and

If it involves function calls, set the pointer to general address access.
According to the method of claim 2 or 3, the plurality of basic blocks access a first memory and a second memory, the storage space of the first memory is defined by a first address to a second address, and the second The storage space of the memory is defined by the third address to the fourth address, and the method further includes:

Determine whether the universal address falls between the first address and the second address;

If it falls between the first address and the second address, set the first variable to be true;

If it does not fall between the first address and the second address, set the first variable to false;

Determine whether the first variable is true; and

If the first variable is true, the pointer is set to access the first memory.
The method of claim 4, wherein the third address is the second address plus one.
The method according to claim 5, wherein if the first variable is false, the pointer is set to access the second memory.
The method according to claim 5, wherein said step of determining said universal address comprises:

It is determined whether the general address is smaller than the third address.
The method according to claim 4, wherein the plurality of basic blocks also access a third memory, and the storage space of the third memory is defined by the fifth address to the sixth address, and the method further comprises:

Judging whether the universal address falls between the fifth address and the sixth address;

If the general address falls between the fifth address and the sixth address, then set the second variable to be true;

Determine whether the second variable is true; and

If the second variable is true, the pointer is set to access the third memory.
8. The method of claim 8, wherein the third address is the second address plus one, and the fifth address is the fourth address plus one.
The method according to claim 9, wherein said step of determining whether said universal address falls between said fifth address and said sixth address comprises:

It is determined whether the universal address is greater than the fourth address.
The method according to claim 8, further comprising:

If the general address does not fall between the fifth address and the sixth address, then the second variable is set to false; where, if the first variable and the second variable are both false, Then, the pointer is set to access the second memory.
The method according to claim 8, further comprising:

Set the third variable to be true;

If it is judged that the first variable is true, then the third variable is set to false;

If it is judged that the second variable is true, then the third variable is set to false;

Determine whether the third variable is true; and

If the third variable is true, the pointer is set to access the second memory.
The method according to claim 1, wherein said determining step comprises:

Obtain the pointer variables of the pointer in all basic blocks;

It is determined whether the pointer variables all correspond to the address space.
The method according to claim 1, wherein the setting step is performed after the plurality of basic blocks are merged.
The method according to claim 1, wherein the traversing step, the judging step, and the setting step are repeatedly executed until the address of the pointer does not change.
The method according to claim 1, wherein the control flow graph is jump control or loop control.
A computer-readable storage medium, on which is stored computer program code for accessing in a system using a universal address, and when the computer program code is run by a processor, the computer program code described in any one of claims 1-16 is executed method.
A computing device comprising a processor core, and the processor core executes the method according to any one of claims 1-16.