CN113672237A - Program compiling method and device for preventing memory boundary crossing - Google Patents

Program compiling method and device for preventing memory boundary crossing Download PDF

Info

Publication number
CN113672237A
CN113672237A CN202111033647.0A CN202111033647A CN113672237A CN 113672237 A CN113672237 A CN 113672237A CN 202111033647 A CN202111033647 A CN 202111033647A CN 113672237 A CN113672237 A CN 113672237A
Authority
CN
China
Prior art keywords
address
segment
memory
program
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111033647.0A
Other languages
Chinese (zh)
Other versions
CN113672237B (en
Inventor
刘晓建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111033647.0A priority Critical patent/CN113672237B/en
Publication of CN113672237A publication Critical patent/CN113672237A/en
Application granted granted Critical
Publication of CN113672237B publication Critical patent/CN113672237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/441Register allocation; Assignment of physical memory space to logical memory space

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

One or more embodiments of the present specification provide a program compiling method and apparatus for preventing a memory boundary crossing, where the method includes: in the process of compiling a source program into a target program, address checking machine codes are generated aiming at memory access codes in the source program, memory allocation spaces corresponding to the target program are located in a section of continuous virtual address sections, the section base address of each virtual address section comprises a section number located at a high position and a plurality of zero-value logic bits located at a low position, and the section length of each virtual address section is not less than the maximum address space which can be represented by the zero-value logic bits; the address checking machine code is used to: comparing high-order data of a linear address corresponding to a target memory space to be accessed by a memory access code with a segment number contained in a segment base address, wherein the linear address is the sum of the segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number; in case the comparison result is the same, the target memory space in the memory is allowed to be accessed according to the linear address.

Description

Program compiling method and device for preventing memory boundary crossing
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for program compilation to prevent memory boundary crossing.
Background
In order to prevent the problem of memory boundary crossing caused by the program accessing an address space other than the virtual address space in the process of accessing the memory, the program generally performs boundary crossing check on an address to be accessed before accessing the memory, and accesses the memory under the condition of ensuring that the access address is not boundary crossing.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a program compiling method and apparatus for preventing a memory boundary crossing.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, there is provided a program compiling method for preventing a memory boundary crossing, including:
in the process of compiling a source program into a target program, generating an address check machine code aiming at a memory access code in the source program, wherein a memory allocation space corresponding to the target program is positioned in a section of continuous virtual address section, a section base address of the virtual address section comprises a section number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the section length of the virtual address section is not less than the maximum address space which can be represented by the plurality of zero-value logic bits; the address checking machine code is to:
aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number;
and allowing the target memory space in the memory to be accessed according to the linear address under the condition that the comparison results are the same.
According to a second aspect of one or more embodiments of the present specification, there is provided a program compiling apparatus for preventing a memory violation, including:
the system comprises an address check machine code compiling unit, a data processing unit and a data processing unit, wherein the address check machine code compiling unit is used for generating an address check machine code aiming at a memory access code in a source program in the process of compiling the source program into a target program, a memory allocation space corresponding to the target program is positioned in a continuous virtual address section, a section base address of the virtual address section comprises a section number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the section length of the virtual address section is not less than the maximum address space which can be represented by the zero-value logic bits; the address checking machine code is to:
aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number;
and allowing the target memory space in the memory to be accessed according to the linear address under the condition that the comparison results are the same.
According to a third aspect of one or more embodiments of the present specification, there is provided an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any of the first aspects by executing the executable instructions.
According to a fourth aspect of one or more embodiments of the present description, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to any one of the first aspect.
Drawings
Fig. 1 is an application scenario diagram of a program compiling method for preventing a memory boundary violation according to an exemplary embodiment.
Fig. 2 is a flowchart of a program compiling method for preventing a memory boundary violation according to an exemplary embodiment.
Fig. 3 is a schematic structural diagram of an apparatus according to an exemplary embodiment.
Fig. 4 is a block diagram of a program compiling apparatus for preventing a memory boundary violation according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Memory violation is one of the major errors in a software system, and refers to the fact that the address pointed by the memory access exceeds the legal address range pre-allocated to the program. Classifying from the access type, wherein the memory boundary crossing comprises a read boundary crossing and a write boundary crossing, wherein the read boundary crossing means that the program reads data which does not belong to the legal address range of the program, and if the read memory address is invalid, the program is possibly crashed; if the memory address being read is valid, no immediate problem will occur at the time of reading, but because the data being read is random, unpredictable results may result. The write-over-boundary is similar to the read-over-boundary, and refers to a behavior of writing data into the memory space outside the legal address range corresponding to the program of the memory space, which may cause the program to crash, or when the memory space belongs to the legal address range corresponding to other programs, it is equivalent to modifying information of other programs in an unauthorized manner, thereby inducing systematic operation failure or security risk. In an example of memory violation, the program first defines an array, for example, an array that can accommodate 16 elements is applied, then both a [16] and a [20] are used for out-of-range access, and a [ -10] or a [ -16] are also out-of-range access, and only the index of the array [0, 15] is used for behavior of normally accessing the corresponding legal address range of the array, where a [ x ] represents the x-th element in the array in the view of human-oriented programming language, and a [ x ] is considered as the logical address where the x-th element in the array is located in the view of computer-oriented programming language.
Because the time of the memory boundary crossing is random, the phenomenon is random, and the caused consequences are often unpredictable and very serious, so that developers are difficult to obtain the essential sources of the boundary crossing errors, and great difficulty is brought to the positioning and repairing of the memory boundary crossing errors. In the related art, a precaution approach is usually adopted to solve the memory out-of-range problem. For example, before the program a executes any memory access instruction, the memory access address to be accessed by the memory access instruction is first checked to see whether the memory access address is within the legal address range of the program a, if the memory access address is within the legal address range of the program a, the memory is accessed according to the memory access address, and if the memory access address is outside the legal address range of the program a, the memory access instruction is stopped to access the memory according to the memory access address, so that program crash or security risk caused by boundary-crossing access is avoided.
Specifically, the process of checking for boundary crossing and determining the memory access address in the related art is described by taking the example that the virtual address space only comprises one continuous virtual address segment as an example, in which the program a runs in a virtual address space pre-allocated by an operating system or a memory allocation program: assuming that the segment base of the virtual address segment is Q, the value is stored in the memory space with the address m, and the segment length of the virtual address segment is L, the value is stored in the memory space with the address n, the legal address range of the program a is limited to [ Q, Q + L ]. For any memory access instruction issued by the program A, the boundary crossing check is needed, the type address of the access indicated by the instruction is set as X, when the access type of the memory access instruction is direct access, then the access address (also called effective address) which the instruction finally needs to access is X itself, at this time, the legal address range of the program A must be determined first, then whether X falls within the legal address range is judged, as previously mentioned, determining the legal address range of program a requires at least the segment base Q and segment length L of the virtual address segment, and therefore, in the process of executing the out-of-range check, the CPU first needs to read the segment base Q in the address m and the segment length L in the address n in the memory, then judging whether X satisfies Q is more than or equal to X and less than or equal to Q + L, if so, considering that the memory access instruction does not generate a memory out-of-range error, and directly accessing the memory according to X; when the memory type of the memory access instruction is base address plus index access, the effective address of the instruction is X + Q, because the form address X is the logical address to be accessed by the program A under the base address plus index access type, the value reflects the offset relative to the program initial address viewed from the program angle, under the condition that the initial address of the program A is aligned with the segment base address of the virtual address segment, the boundary crossing check logic can firstly judge whether the memory access instruction can generate a memory boundary crossing error by comparing the length L of the segment (actually, the offset of the maximum memory space allowed to be accessed relative to the segment base address Q) with the size relation of the logical address X without firstly calculating the effective address X + Q, and then under the condition that the memory access instruction cannot generate the memory boundary crossing error (under the condition that X is less than or equal to L) calculating the effective address X + Q, and finally, accessing the memory according to the effective address X + Q, obviously, judging the segment length L and the logical address X necessarily involves the CPU to read the address n in the memory to obtain the segment length L, and calculating the effective address X + Q necessarily involves the CPU to read the address m in the memory to obtain the segment base address Q.
It can be known from the above discussion that, in the related art, no matter whether the access type of the memory access instruction is direct access or base plus index access, no matter whether the optimization of the execution sequence related to the address check and the address generation is performed in the border crossing check logic, in the process of a complete border crossing check and determination of the effective address realized by a memory access instruction, at least two memory access processes are necessarily involved, when the legal address range of the program includes a plurality of discontinuous virtual address segments, it is necessary to access the memory more times to obtain the segment base address and the segment length of each virtual address segment, thereby sequentially judging whether the access address is in the legal address range formed by the plurality of virtual address segments, and meanwhile, the border crossing check also involves the comparison logic of the numerical value relationship, and no matter whether the memory access process or the numerical value comparison process greatly reduces the execution efficiency, affecting the performance of the execution of the memory access instructions. In addition, although the segment length and the segment base address can be prestored in two different registers, so that the corresponding registers can be directly read without accessing the memory under the condition that the segment length and the segment base address of the virtual address segment need to be acquired, the improvement cannot change the necessary premise that the segment length and the segment base address need to be inquired to complete boundary-crossing check and effective address determination in the related technology, and cannot eliminate the numerical value size comparison process necessarily involved in the related technology.
Therefore, the present specification provides a program compiling method for preventing a memory from crossing a boundary, which is applied to a compiler or an interpreter, and can generate an address checking machine code when compiling a memory access code, wherein the address checking machine code is used for performing a boundary crossing check on a linear address to be accessed by the memory access code and allowing the linear address to be accessed under the condition of passing the boundary crossing check, two factors of a segment length and a segment base address are not required to be inquired in the process, a value size comparison process is not required to be involved, only the comparison of whether the involved values are equal is required, only an exclusive-or operation is required in the specific implementation, the execution efficiency of the exclusive-or operation is far higher than the arithmetic operation required by the value size comparison in the specific implementation, and the scheme related to the present specification can simultaneously complete the tasks of the boundary crossing check and the effective address by using the segment number of a virtual address segment under the specific memory allocation principle, therefore, the implementation efficiency of preventing the memory access code from generating the memory out-of-range error by the out-of-range check mode is greatly improved.
The program compiling method for preventing the memory boundary crossing is applied to a compiler or an interpreter for compiling, the memory access instruction related to the embodiment of the description refers to a computer language which can generate and access a specific memory access address after being executed, and the form of the memory access instruction can comprise an instruction written by a high-level language, an assembly instruction, a pseudo instruction, machine codes and the like; the memory access code referred to in the embodiments of the present description refers to an interpreted language having the same function as the memory access instruction and no machine code in form, and having user readability, and is a source code that must be compiled to generate a machine code and then executed by the processor to finally implement the memory access function, and may include various programming languages written using bytecode, C + +, Python, and Java, for example; the machine code referred to in the embodiments of the present description refers to a machine language oriented to the processor, which can be directly executed by the processor and has no user readability.
In the embodiment of the present specification, the memory is divided into a plurality of address segments with consecutive internal addresses in a linear address space, each address segment does not overlap with another address segment, and a segment base address of each address segment includes a segment number located at a high level and a plurality of zero-value logic bits located at a low level. Fig. 1 is an application scenario diagram of a program compiling method for preventing a memory boundary crossing provided in an exemplary embodiment, taking fig. 1 as an example, a memory address in the diagram needs to be represented by a 6-bit 16-ary number (equivalent to a 24-bit binary number), so that the memory is a 24-bit memory, and assuming that the size of a memory space corresponding to one physical address is 1 byte (8 bits), the memory size of the diagram is 1B × 2^24=16MB, as shown in the figure, the memory is divided into a plurality of continuous address segments, where the address segments corresponding to virtual address spaces 00:0000H ^ 00: FFFFH are used for loading kernel state programs including system programs and operating systems; the address fields corresponding to the virtual address space 01: 0000H-01: FFFFH are used for loading the user mode program application program 1; the address field corresponding to the virtual address space 02: 0000H-02: FFFFH is used for loading the user mode program application program 2; and the address fields corresponding to the virtual address space 03: 0000H-03: FFFFH are used for loading the user mode program application program 3, and the like. It can be seen that, in the memory allocation scenario shown in fig. 1, the memory is equally allocated into a plurality of address segments with the same segment length, the upper 2 bits (corresponding to 8-bit binary numbers) in the first address (i.e. segment base address) of each address segment represent the segment number, the lower 4 bits (corresponding to 16-bit binary numbers) represent zero-value logic bits, the value of each binary number in the zero-value logic bits is 0, and the segment length of each address segment is equal to the maximum address space that can be represented by 4-bit 16-bit binary numbers, i.e. 16^4= 65536. It should be noted that, although the segment lengths of the different address segments are all set to be the same in the application scenario shown in fig. 1, the segment lengths of the address segments may be set to other sizes (at this time, the number of the segment numbers of the different address segments may be different) according to the needs of the actual scenario, but it is still necessary to ensure that the logical bits other than the segment number in the segment base address of each address segment are zero logical bits no matter how long the segment length of the address segment is.
In the application scenario shown in fig. 1, the virtual address space occupied by the system program or the application program in the memory does not exceed the segment length of the address segment where the system program or the application program is located, taking the application program 2 as an example, the program head address of the application program 2 is 02:0000H, which is aligned with the segment base address of the virtual address segment corresponding to the virtual address space 02: 0000H-02: ffffffffh, the allocated virtual address space does not exceed the segment length of the virtual address segment, in the address segment corresponding to the application program 2, the application program 2 maintains a code segment containing processor instructions (machine codes) and a data segment containing application data, when the application program 2 is loaded into the corresponding address segment and executed, the CPU will read the machine codes in the code segment from low to high according to the address sequence and execute corresponding processing operations, such as arithmetic operation, instruction jump, processor operation for accessing data, and the like, similarly, when a system program or other application program is loaded into the corresponding address section and needs to be executed, the CPU will read the machine code in the corresponding code section and process it.
The following describes the program compiling method for preventing the memory boundary crossing in this specification in detail with reference to fig. 1. Referring to fig. 2, fig. 2 is a flowchart of a program compiling method for preventing a memory boundary violation according to an exemplary embodiment. As shown in fig. 2, the method is applied to a compiler or an interpreter, and may include the following steps:
step 202, in the process of compiling a source program into a target program, generating an address checking machine code for a memory access code in the source program, where a memory allocation space corresponding to the target program is in a section of continuous virtual address segment, a segment base address of the virtual address segment includes a segment number located at a high level and a plurality of zero-value logic bits located at a low level, and a segment length of the virtual address segment is not less than a maximum address space that can be represented by the plurality of zero-value logic bits.
The address checking machine code is to:
step 204, aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number.
In step 206, if the comparison result is the same, the target memory space in the memory is allowed to be accessed according to the linear address.
The step 202 occurs in the compiling stage of the address checking machine code, and the steps 204 and 206 occur in the executing stage of the address checking machine code.
The source program referred to in the embodiments of the present description is also referred to as source code, and refers to a text file written according to a certain programming language specification, and is a series of human-readable computer language instructions. In modern programming languages, the source code may be in the form of a book or tape, but the most common format is a text file, the typical format being intended for compiling a computer program. The ultimate purpose of computer source code is to translate human-readable text into binary machine code that can be executed by a computer, a process called compilation, which is done by a compiler or interpreter. The object program referred to in the embodiments of the present specification is also referred to as a target program, and refers to a machine code set that is generated after a source program is compiled and can be directly run by a computer, that is, the target program is a program processed (assembled, compiled, interpreted) by the source program through a language processing program (assembler, compiler, interpreter) in an interpreter or compiler, and the target program is composed of machine codes and can be recognized and directly run by the computer, obj is used as an extension name on a computer file, and the target program can be directly loaded into a memory and run after being connected with a library function.
A source program generally includes a large amount of memory access code, which is also converted into corresponding machine code during the compilation process, and thus is used as a part of the target program. The traditional border crossing checking task is usually completed by an operating system or a specific border crossing checking program, namely, in the process of compiling the memory access code into the machine code, the machine code used for determining an effective address according to a form address and a memory access type and accessing the memory according to the effective address is generated according to the memory access type and a carried form address of the memory access code, and no additional machine code used for realizing the border crossing checking task is generated, wherein the border crossing checking task is realized by calling the operating system or the specific border crossing checking program when a processor reads and executes the machine code with the memory access function. Although the underlying implementation of the border crossing checking program is also based on machine code, since each border crossing checking mechanism involves at least one procedure call, it necessarily has a negative impact on the execution efficiency of the target program that has the border crossing checking mechanism enabled.
In the embodiment of the present specification, in the process of compiling a source program into an object program, in addition to generating a machine code for "determining an effective address according to a form address and a memory access type, and accessing a memory according to the effective address", an address check machine code is generated for a memory access code in the source program, and the address check machine code is used for completing an out-of-bounds check task. Therefore, in the process of executing the target program, the processor only needs to execute the machine codes in sequence, and does not need to call an operating system or a specific boundary crossing checking program to execute a boundary crossing checking logic every time the machine codes are accessed and stored, which is equivalent to the process of canceling the program call, and the execution efficiency of the target program with the boundary crossing checking mechanism is effectively improved.
In addition to lightweight processing (canceling procedure call) for the architecture executing the boundary crossing checking task, the embodiments of the present specification make optimization improvements for the logic itself of the boundary crossing checking. As described above, when being executed by a processor, the address checking machine code generated in the embodiment of the present disclosure is configured to compare, for a target memory space to be accessed by the memory access code, upper data of a linear address corresponding to the target memory space with a segment number included in the segment base address, where the linear address is a sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the upper data and the segment number have the same number of bits, and if a comparison result is the same, allow access to the target memory space in a memory according to the linear address.
In order to improve the above-mentioned out-of-bounds checking logic, a certain requirement is first required for a memory allocation policy of a target program, and specifically, a memory allocation space corresponding to the target program needs to be in a continuous virtual address segment, a segment base address of the virtual address segment includes a segment number located at a high level and a plurality of zero-value logic bits located at a low level, and a segment length of the virtual address segment is not less than a maximum address space that can be represented by the plurality of zero-value logic bits. When the compiler finishes compiling the target program and is triggered to execute, the operating system may be caused to load the target program into the virtual address segment in the memory, and cause the processor to execute a plurality of machine codes in the target program. For example, for a blockchain node in a blockchain system, the source program or the target program may be an intelligent contract deployed in the blockchain system. When the intelligent contract based on the source program is deployed on the block link point, the source program may be an intelligent contract written in byte code and needs to be executed in a virtual machine environment, and when the intelligent contract based on the target program is deployed on the block link point, the target program is an AOT (Ahead of Time) contract composed of a plurality of machine codes obtained through compilation, so that the intelligent contract can be directly executed by a processor after being loaded into a memory.
Under the condition that the memory access type of the memory access code is direct memory access, a formal address X carried by the memory access code can be regarded as a linear address (namely an effective address) of memory access required by the memory access code, the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, the linear address X is directly contained in the memory access code, so that the linear address is not required to be obtained in an arithmetic operation mode in the execution process, the generated address checking machine code carries out boundary crossing check on the linear address X, specifically, as shown in FIG. 1, the value of the address X is assumed to be 02:1234H, the target program is assumed to be the application program 2 in FIG. 1, the allocated memory allocation space is in a virtual address segment of which the address range is 02: 0000H-02: FFH in the figure, the segment base address Q of the virtual address segment is 02:0000H, the value is stored in the second preset memory space of the data segment of the target program, the address of the second predetermined memory space is m, which is a fixed value determined at the time of compilation, the upper 2 bits of the segment base address Q are a segment number, the lower 4 bits are zero-valued logic bits, and the number of bits regarding the segment number and the zero-valued logic bits is known in advance by the processor, and therefore, in the process of executing the address checking machine code, the CPU first needs to read the segment base Q in the address m in the memory, the segment number 02H in the upper 2 bits of the segment base Q is then compared with the upper data 02H fetched from the upper 2 bits of the linear addresses 02:1234H, under the condition of the same comparison, the memory access based on the linear address X is considered not to generate the memory out-of-range error, the memory is allowed to be directly accessed based on the linear address X, and executing subsequent machine codes for accessing the target memory space corresponding to the linear address X.
When the memory type of the memory access code is base address plus index access, the formal address X carried by the memory access code is not an effective address to be accessed finally, but is used as a logical address X facing an object program, as described above, the real effective address will be X + Q, so that in the compiling stage, in addition to generating an address checking machine code, an address calculating machine code is also generated to calculate an effective address (linear address), as shown in fig. 1, assuming that the logical address X is 1234H, the CPU can first execute the address calculating machine code generated by compiling, read out the segment base address Q prestored by the address m in the memory through the CPU, then obtain the effective address X + Q as 1234H +02:0000H =02:1234H through arithmetic operation, where the used logical address X is written as an immediate number in an operand corresponding to the address calculating machine code, therefore, there is no need to access the memory for reading, and the obtained effective address is the linear address of the target memory space actually required to be accessed by the memory access code in the linear address space, then the CPU continues to execute the address check machine code, and the segment base address Q in the address m does not need to be read from the memory again in the execution process, because the segment base address Q has been read when the address calculation machine code is executed, and therefore the segment base address is still stored in the CPU internal register at this time, the CPU can directly perform the subsequent operation on the segment base address Q in the internal register, extract the segment number 02H of the segment base address Q with 2 bits higher and the high-order data 02H of the calculated linear address, and compare them to determine whether they are the same, for example, the comparison process can be implemented by performing the xor operation, and if the xor operation result is 00H, the comparison result is considered to be the same, allowing the memory to be directly accessed according to the linear address X, and executing subsequent machine codes for accessing the target memory space corresponding to the linear address X, and if the comparison result is different under the condition that the XOR operation result is a value other than 00H, triggering an out-of-range alarm mechanism, stopping program operation and returning error information to indicate the instruction position of the memory out-of-range error.
When the memory access type of the memory access code is base address plus index access, except for the above embodiment of calculating the linear address first and then performing border crossing check on the high-order data of the linear address, an embodiment of performing border crossing check on the high-order data of the linear address first and then calculating the linear address can be adopted, but in this case, it is necessary to ensure that the first address of the target program is aligned with the segment base address of the virtual address segment. The border crossing check logic according to the embodiments of the present disclosure is to determine whether the high-order data of the linear address is the same as the segment number in the segment base, and the above determination logic is equivalent to whether the logical address as the index will result in carry of the digit of the segment number in the segment base as the base when the arithmetic operation of adding the index to the base is performed, and in the case that all the logical bits except the segment number in the segment base are zero-valued logical bits, the determination logic is equivalent to determining whether the digit of the logical address is smaller than the digit of the zero-valued logical bit in the segment base. Therefore, in the case where the memory access code has not calculated the linear address temporarily, the comparison of the upper data of the linear address with the segment number included in the segment base address can be completed based on the property of the logical address itself. Specifically, the boundary crossing check logic may determine whether a memory boundary crossing error may occur in a linear address obtained by subsequent calculation without first calculating an effective address X + Q, but may determine whether a memory boundary crossing error may occur in a linear address obtained by comparing the number of bits of the logical address X, because, for a base address plus an index access type in which the base address, i.e., the base address, is fixed and the value of the base address is a segment base address Q of a virtual address segment corresponding to a memory allocation space of a target program, whether a boundary crossing occurs depends entirely on the index, i.e., the logical address X, still taking fig. 1 as an example, since the segment base address Q of the virtual address segment is composed of a segment number 02H with a high 2 bits and a zero-valued logical bit 0000H with a low 4 bits, and the segment length is a maximum address space 65536 that can be represented by a 16-bit system, and in a case where a first address of the target program is aligned with the segment base address of the virtual address segment, the logical address X is a segment offset of a target memory space that is most required to be accessed and reflected relative to the segment base address Q Obviously, in order to ensure that the memory out-of-range phenomenon does not occur, the logical address X cannot exceed FFFFH in value, which means that when the 16-bit number of the logical address X is 4 bits or less, it can be ensured that the finally determined effective address X + Q is within the virtual address space 02: 0000H-02: ffffffh corresponding to the virtual address segment, in this example, the logical address X cannot exceed 4-bit 16-bit, but normally, the number of bits that the logical address cannot exceed is also determined by the segment base address, for example, the number of bus bits (24 binary bits in this example) of the memory is preset by the CPU, when the CPU executes the address checking machine code, the CPU first accesses the second preset storage space in the target program data segment to read the segment base address Q, and then further analyzes that the number of binary bits of the segment number is p, it can be determined that the number of binary bits that the logical address X cannot exceed is 24-p, or directly using the binary digit number of zero-valued logic bits in the segment base Q as the binary digit number that the logic address cannot exceed. After the check of the number of bits of the logical address X is completed to ensure that no memory out-of-bounds error is generated, the compiled address calculation machine code is further executed, the segment base address Q stored in the register (because the segment base address Q is read from the memory into the internal register when the memory check machine code is executed) is read, then the linear address X + Q is calculated to be 02:1234H, and simultaneously, the memory is allowed to be accessed according to the linear address X + Q, so that the subsequent machine code for accessing the target memory space corresponding to the linear address X + Q is executed.
Whether the linear address is out of range or not can be judged by comparing whether the segment number at the high position in the segment base address is the same as the high position data at the high position in the linear address, because the segment base address corresponding to the virtual address segment in the embodiment of the present specification is set to include the segment number at the high position and a plurality of zero-valued logic bits at the low position, the segment length of the virtual address segment is not less than the maximum address space which can be represented by the plurality of zero-valued logic bits, and the memory allocation space of the target address is in the virtual address segment. Taking fig. 1 as an example, the segment base address of the virtual address segment where the target program is located is 02:0000H, which includes the segment number 02H with 2 high bits and the zero-valued logic bit 0000H with 4 low bits, and the segment length of the virtual address segment is not less than the maximum address space 65536 that can be represented by 16-system bits with 4 bits, which means that the virtual address spaces 02: 0000H-02 ffh are all included in the virtual address segment, if the linear address of the target memory space that is finally required to be accessed does not have the memory out-of-bounds error, it is necessary to ensure that the linear address must fall within the virtual address segment allocated for the target program, and since the segment length of the virtual address segment is not less than the maximum address space that can be represented by the zero-valued logic bits, the virtual address spaces 02: 0000H-02 ffh must be within the virtual address segment, in other words, 02: 0000H-02 ffh must belong to the legal address range of the target program, it is only necessary to ensure that the linear address is in the range of 02: 0000H-02: ffffffh, and obviously, for the linear address, no matter how many the value of the segment offset of the lower 4 bits is, as long as the high-order data of the upper 2 bits is 02H, it can ensure that the linear address is in the virtual address space of 02: 0000H-02: FFFFH, that is, it can ensure that the linear address does not have the memory cross-boundary error, and 02H happens to be the segment number of the upper 2 bits in the segment base address of the virtual address segment, therefore, under the memory allocation strategy related to the embodiment of the present specification, the memory allocation space corresponding to the target program is set as a continuous virtual address segment, and the virtual address segment is ensured to have the following properties: the segment base address comprises a segment number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the segment length is not less than the maximum address space which can be represented by the zero-value logic bits, so that the segment base address simultaneously contains information of the segment base address and the segment length. By adopting the out-of-range check logic according to the embodiment, when the processor executes the address check machine code including the out-of-range check logic, the tasks of out-of-range check and effective address determination can be simultaneously completed by using the segment number of the virtual address segment, the number of times of accessing the memory or the number of occupied registers is reduced, and meanwhile, the process of comparing the numerical values is avoided, and the process of comparing whether the numerical values are equal is used for replacing the process of comparing the numerical values.
In an embodiment, the segment length of the virtual address segment may be the maximum address space that the number of zero-valued logic bits can represent. As mentioned above, in order to avoid the memory boundary crossing, it is necessary to ensure that the target memory space corresponding to the linear address is within the virtual address segment, in this embodiment of the present specification, this is achieved by ensuring that the segment length of the virtual address segment is not less than the maximum address space that can be represented by the zero-value logic bits, but when the segment length of the virtual address segment is greater than the maximum address space that can be represented by the zero-value logic bits, the target program may be allocated to a remaining memory space that is outside the maximum address space that can be represented by the zero-value logic bits but still belongs to the virtual address segment, which means that the target program may completely apply to the remaining memory space in the normal execution process, however, in this embodiment of the present specification, the boundary crossing checking logic involved in the address checking machine code only admits that the address access inside the maximum address space that can be represented by the zero-value logic bits is not boundary crossing, in other words, for the normal access of the remaining memory space, after the memory violation prevention scheme according to the embodiments of the present specification is applied, the memory violation prevention scheme is considered as a violation access, so that the normal operation of the target program is affected.
Specifically, since the boundary crossing check logic in the embodiment of the present specification recognizes only the linear addresses with the high order data and the same segment number, and considers that such linear addresses do not have a memory boundary crossing error, which is equivalent to the fact that the spatial size of the legal linear address space that can be accessed by the target program is actually determined by the number of bits of the low order data other than the high order data in the linear addresses, and the number of bits of the low order data is defined to be the same as the number of bits of the zero order logic bits in the segment base address of the virtual address segment (since the number of bits of the high order data is the same as the segment number, and the total number of bits used to represent the linear addresses of the memory is predetermined), the spatial size of the legal linear address space is equal in value to the spatial size of the maximum address space that can be represented by the zero order logic bits, and is equal to 2^ x (x is the number of the zero order logic bits of the binary), for a virtual address segment whose segment length is greater than the maximum address space obtained by the above calculation, after the solution for preventing the memory violation according to the embodiment of the present specification is adopted, a part of the memory space of the virtual address segment cannot be accessed, for example, the linear address space corresponding to the virtual address field is 01: 00H-02: the number of the FF's is greater than the total number of the FF's, therefore, the segment base address of the virtual address segment is 01:00H, the target segment number is 01H, the segment length is 512, and exceeds the maximum address space 2^8=256 which can be represented by the logic bits of the 8-bit binary zero value, so according to the out-of-range check logic related to the embodiment of the present specification, the legal linear address space which can be recognized is only 01: 00H-01: the number of the FF's is greater than the total number of the FF's, but not 02: 00H-02: FF, this will therefore result in the target program assigned in the virtual address field not being able to access the full address space of the virtual address field. Therefore, the segment length of the virtual address segment can be set to the maximum address space which can be represented by the zero-value logic bits, so that the target program can completely access the memory allocation space allocated to the target program, thereby avoiding the problem that the first user program cannot completely access the code segment or the data segment of the first user program and finally causes operation errors due to misjudgment of border-crossing check in the execution process, and simultaneously improving the utilization rate of the memory. In addition, on the basis of applying the above embodiment, as shown in fig. 1, the memory may be divided into a plurality of address segments that are continuous in a linear address space in a close arrangement manner, so as to further improve the utilization rate of the memory.
In the embodiment of the present specification, an address checking machine code is generated for a memory access code in a source program in the process of compiling the source program into a target program, because the embodiment of the present specification adopts a special memory allocation strategy, that is, a memory allocation space corresponding to the target program is in a continuous virtual address segment, a segment base address of the virtual address segment includes a segment number located at a high level and a plurality of zero-valued logic bits located at a low level, and a segment length of the virtual address segment is not less than a maximum address space which can be represented by the plurality of zero-valued logic bits, when the address checking machine code completes a boundary-crossing checking task, it can be completed only by using one factor of the segment base address of the virtual address segment, which means that boundary-crossing checking and linear address determining processes, the number of times of accessing the memory can be reduced to one time, or only one register is required to be occupied to store the segment number or the segment base address of the virtual address segment, so that the implementation efficiency of preventing the memory access code from generating the memory out-of-range error in an out-of-range check mode is greatly improved.
Optionally, the method further includes: and generating an address calculation machine code aiming at the memory access code, wherein the address calculation machine code is used for writing the linear address obtained by adding the segment offset and the segment base address carried in the memory access code into a first preset storage space, so that the address check machine code obtains the linear address corresponding to the target memory space from the first preset storage space during execution.
As described above, in this embodiment of the present disclosure, in a case that the access type of the memory access code is base plus index access, the compiler may further generate an address calculation machine code for the memory access code, specifically, based on a logical address (segment offset) carried in the memory access code and a second preset storage space in which the segment base is pre-stored, generate an address calculation machine code for calling an ALU (arithmetic and logic unit), where the address calculation machine code is the first preset storage space, and the operand is a logical address in the memory access code as an immediate number and the second preset storage space as an address. Therefore, when the address calculation machine code is executed, the linear address obtained by adding the segment offset carried in the memory access code and the segment base address is written into a first preset storage space, so that the address check machine code acquires the linear address corresponding to the target memory space from the first preset storage space during execution. In the embodiment of the present specification, the execution order of the generated address calculation machine code is set before the address check machine code (linear address is calculated first and then boundary crossing check is performed), and actually, in the boundary crossing check logic implemented by judging the number of bits of the logical address, the execution order of the address calculation machine code may be set after the address check machine code (linear address is calculated after boundary crossing check is performed first).
Optionally, the method further includes: and generating a boundary crossing alarm machine code aiming at the memory access code, wherein the boundary crossing alarm machine code is used for triggering a boundary crossing alarm mechanism under the condition that the comparison result of the high-order data of the linear address and the segment number contained in the segment base address is different.
As mentioned above, the execution logic of the address checking machine code can only ensure that the memory is allowed to be accessed according to the linear address if the comparison result of the high-order data of the linear address and the segment number included in the segment base address is the same, however, if the comparison result of the high-order data and the segment number is different, there is no logic definition, so in the embodiment of the present specification, the out-of-range alarm machine code is additionally generated, so as to trigger the out-of-range alarm mechanism if the linear address fails to pass the out-of-range check after the address checking machine code is executed, for example, the out-of-range alarm mechanism at least includes stopping running the target program and/or outputting error information to indicate the instruction position where the memory out-of-range occurs, where the output error information is used to indicate that a memory out-of-range error occurs during the running of the program, and the position of the memory access instruction corresponding to the machine code where the memory out-of-range error specifically occurs, therefore, developers can conveniently obtain error reasons and locate errors in time, and subsequent debugging and solution to the memory out-of-range errors are facilitated.
Optionally, the method further includes: and generating an address access machine code aiming at the memory access code, wherein the address access machine code is used for accessing the target memory space in the memory according to the linear address under the condition that the execution result of the address check machine code allows the target memory space in the memory to be accessed according to the linear address. Under the condition that the memory adopts page type storage management, the processor accesses the target memory space in the memory according to the physical address obtained by linear address conversion; and under the condition that the memory does not adopt page type storage management, the processor accesses the target memory space in the memory by taking the linear address as a physical address. Because the linear address is not directly used as a physical address of a target memory space but includes a page number and a page offset when page-based storage management is adopted, and a page table needs to be further searched to determine a final physical address, a processor needs to convert the linear address into the physical address first, and then accesses the target memory space in the memory based on the converted physical address; and if the memory is not managed by page memory, the linear address is equivalent to the physical address of the target memory space, so that the processor can directly access the target memory space in the memory by taking the linear address as the physical address. By adopting page type storage management, each address field in the memory can be divided into a large number of page blocks respectively, and then a page table is constructed according to the head address of each page block and the authority control information, so that the read-write authority of the memory can be managed conveniently (for example, the user mode program is limited to modify the data of the code segment of the memory), and the virtual memory technology can be applied to realize the replacement of the internal memory and the external memory of the page granularity.
Optionally, the compiling the source program into the target program includes: compiling the source program into the target program in the process of interpreting and executing the source program; or compiling the source program AOT into the target program.
In this embodiment of the present specification, the compiling the source program into the target program may occur during an interpretive execution, where the interpretive execution referred to in this embodiment of the present specification refers to executing while compiling, for example, after a compiler or an interpreter finishes compiling any code in the source program to generate a machine code corresponding to the any code, the machine code is loaded into a virtual address segment, so that a processor executes the any machine code; alternatively, the source program may be compiled into the target program, or may occur during the process of compiling execution, where the compiling execution according to this embodiment refers to execution after the compiling is completed, for example, when the target program is compiled by AOT and triggered to be executed, the operating system loads the target program into the virtual address segment in the memory, and causes the processor to execute a plurality of machine codes in the target program. The program compiling method for preventing the memory boundary crossing can be applied to the compiling process of interpretation execution and can also be applied to the compiling process of compiling execution, and the program compiling method is not limited in any way by the specification.
Optionally, the segment number or the segment base address is pre-stored in a second preset storage space, so that the address checking machine code obtains the segment number included in the segment base address from the second preset storage space during execution.
As described above, the address checking machine code needs to obtain the segment number of the virtual address segment allocated to the target program during the execution process, so before the target program is loaded into the virtual address segment for execution, the segment base address of the virtual address segment or the segment number contained in the segment base address is written into the second preset storage space in advance, so that the address checking machine code can obtain the segment number contained in the segment base address from the second preset storage space during execution, for example, when the segment base address of the virtual address segment is prestored in the second preset storage space, the address checking machine code needs to read the segment base address from the second preset storage space and extract the segment number of the virtual address segment from the high-order data of the segment base address according to the number of the preset segment number (in case that the segment number is directly stored in the second preset storage space, the segment number can be directly read from the second preset storage space), thereby facilitating the completion of the comparison of the high order data of the linear address with the segment number of the virtual address segment to complete the logic of the out-of-bounds check.
Optionally, on the basis of the above description, the second preset storage space only participates in the compiling process for generating the address checking machine code. In this embodiment of the present specification, since the second preset storage space stores the segment number or the segment base address of the virtual address segment, and the segment number is necessary information for determining whether the linear address crosses the boundary when executing the address checking machine code, before the target program is loaded into the virtual address segment in the memory and is ready to start executing, the segment base address or the segment number of the virtual address segment needs to be stored in the second preset storage space in advance to ensure that the subsequent address checking machine code can correctly and effectively implement the function of checking whether the linear address crosses the boundary in the execution process. However, in the process of compiling the source program to the target program, the source program may include not only the memory access code that needs to be compiled, but also other function codes that need to be compiled into the machine code, and the compiled machine code may involve applying to use the memory space in the register and the memory, which includes the second preset memory space, and in the case that the second preset memory space is used (for example, modified) in the machine code generated after other compiling, the validity of the cross-border checking mechanism in the execution process of the target program cannot be guaranteed, because after the operation of other machine codes tampers the value stored in the second preset memory space, the subsequently executed address checking machine code will read the wrong segment number or segment base address, and thus the function of the cross-border checking cannot be really realized. Therefore, in the embodiment of the present specification, the compiler or the interpreter is configured to enable the second preset storage space to participate only in the compiling process for generating the address checking machine code, that is, only when the address checking machine code is generated, the operation logic using the second preset storage space may be defined in the address checking machine code, and when other machine codes are generated, the second preset storage space is disabled, or the operation logic for the second preset storage space is limited to be read only, so as to avoid the above problem that the border crossing checking mechanism fails due to tampering of the second preset storage space.
Optionally, the modification authority of the second preset storage space is owned by the operating system. In this embodiment of the present specification, in order to enable the address checking machine code to correctly and effectively implement the border crossing checking logic during the execution process, it is necessary to prevent the functional disorder of border crossing checking caused by modifying the segment number or the segment base address in the second preset storage space by the target program or other user mode programs, and it is unable to achieve the effect of normally preventing the memory border crossing, for example, an instruction for performing relevant call to the second preset storage space may be set as a privileged instruction, so that the segment number or the segment base address in the second preset storage space may be modified only when the CPU is in the kernel mode or corresponds to the privileged instruction, for example, when the operating system needs to transfer the control right of the CPU to the target program as the user mode program, the operating system should first issue a privileged instruction to make the CPU modify the segment number in the second preset storage space to the segment base address or the segment number of the virtual address segment where the user mode program is located, then the processor mode is adjusted to the user mode (or the processor mode is modified first, then the privileged instruction is sent out to modify the second preset storage space), finally the privileged instruction jumps to the program entry of the target program, so that the CPU starts to read the instruction in the code segment of the target program, certainly, the modification authority of the second storage space can also be opened to the address checking machine code generated in the compiling process, thereby the target program automatically modifies the second preset storage space into the segment number or the segment base address of the virtual address segment where the target program is located in the running process, and the operating system does not need to be used as a bearer for program initialization (modifying the second preset storage space into the correct segment number or segment base address), thereby simplifying the initialization flow of the running of the target program. Through the embodiment of the specification, the target program can be prevented from modifying the segment number in the second preset storage space privately in the execution process, so that the effectiveness of the address checking machine code in the execution process is enhanced, the target program can be effectively prevented from accessing the address which the target program does not have access right, and the safety of the system is improved.
Optionally, the second preset storage space includes any one of the following: a register, a predefined memory space in the virtual address segment, or a predefined memory space outside the virtual address segment. When the second preset storage space is a predefined memory space in the virtual address segment, for example, the second preset storage space may be a memory space corresponding to a first address of the target program, or may be a memory space having a fixed logical address in the data segment as shown in fig. 1, and the rights management mode of the second preset storage space may be set to a read-only mode, so as to avoid modification of the target program in the execution process; when the second preset storage space is a predefined memory space outside the virtual address segment, all access behaviors of the target program are protected by border-crossing check after the target program is compiled, that is, in the execution process of the target program, only the address checking machine code generated according to the embodiment of the description accesses the second preset storage space, which means that the access is necessarily trustable, so that the target program does not need to worry about accessing or modifying the second preset storage space in the execution process, and at this time, the authority management mode of the second preset storage space can be set to be a read-write mode; when the second predetermined storage space is a register, the register may include an internal register in the processor, an external register as a peripheral device, or a virtual register defined in the memory. In this embodiment, the register may be built in the processor and has the same status as other general registers inside the processor, so that the CPU knows the physical address of the register by default and accesses the register according to the corresponding physical address when the register needs to be called in the related instruction, and when the second preset storage space is an internal register, since it is not necessary to access the memory to obtain the segment number of the virtual address segment, the execution efficiency of the out-of-bounds checking may be further improved. Of course, the register may also be an external register, and at this time, the CPU needs to call the register through the I/O interface, so when using the external register as an external device, the operating system or the CPU needs to declare an I/O port address corresponding to the external register. Similarly, the register may also be a virtual register defined in the memory, which is equivalent to allocating a memory space dedicated to the CPU in the memory, and therefore, the physical address of the virtual register in the memory also needs to be declared to the operating system or the CPU.
Optionally, the first address of the target program is aligned with the segment base address of the virtual address segment. In the embodiment of the present specification, since the head address of the target program is aligned with the segment base address of the virtual address segment in which the target program is located, an arbitrary logical address (segment offset) generated in the target program not only represents a relative positional relationship between the head address of the target program and the virtual space (offset of the virtual space within the application program), but also can actually represent a relative positional relationship between the target memory space to be accessed and the segment base address of the virtual address segment (segment offset of the actual target memory space in the virtual address segment in the memory), and in a case where the segment base address of the virtual address segment is set to have 4 lower bits which are 0 (equal in value to the logical address of the head address of the target program), since the logical address of the head address of the target program is substantially a distance from the segment base address, it is possible to ensure that the logical address of the head address of the target program and the logical bit in the segment base address of the virtual address segment are uniform in value, the logical bits of zero value with the same number of bits are all used, so that when the CPU executes the memory access instruction in the target program, the target memory space determined according to the logical address carried in the instruction can correctly reflect the memory space actually required to be accessed by the first user mode program, because only on the basis that the logical address of the head address of the target program and the logical bit of zero value in the segment base address of the virtual address segment are numerically unified, the logical address of the target memory space required to be accessed by the target program in the program view angle can be numerically consistent with the lower bit data of the linear address in the virtual address segment of the target memory space required to be actually accessed, thereby ensuring that the target program cannot be accessed in a wrong position in the execution process, and ensuring that the accessed target address space is the memory space required to be accessed by the target program.
Optionally, the virtual address field corresponding to the memory allocation space of the target program may be set by adopting the following policy: firstly estimating a virtual address space which may be occupied by the target program, then dividing a new address segment in a free memory based on the estimated virtual address space so as to enable the segment length of the newly divided address segment to be matched with the estimated virtual address space, specifically, enabling the segment length of the newly divided address segment to be not less than the estimated virtual address space, but simultaneously ensuring that the newly divided address segment also meets the condition that a segment base address of the newly divided address segment contains a segment number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the segment length is the maximum address space which can be expressed by the plurality of zero-value logic bits, so that only a plurality of discrete values can be obtained for the segment length of the newly divided address segment finally, the values all meet the power of N of 2, N is a positive integer, in order to reduce internal fragments to the maximum extent, the estimated space size of the virtual address space can be set as R, determining N satisfying the following formula:
Figure 534802DEST_PATH_IMAGE001
after determining N, 2NThe segment length of the virtual address segment corresponding to the memory allocation space of the target program is searched, and then a segment in an idle state, the low-order data of the low N-order in the first address are all 0 and the segment length is 2 is searched in the memoryNThe continuous address space is used as a virtual address segment corresponding to the target program, so that the segment base address of the virtual address segment determined under the memory allocation strategy is at the lower N bitsThe lower data of the virtual address segment is used as a zero-value logic bit, while the upper data (logic bits except the zero-value logic bit) at the upper position is used as a segment number of the virtual address segment, the target program is loaded into the virtual address segment divided based on the policy when needed to be executed, the segment number of the virtual address segment set under the policy and the number of the zero-value logic bit may have a certain difference from the previously divided address segment, but the application of the memory violation prevention scheme related in the present specification is not affected, and the address checking machine code is used for judging whether the linear address may have a memory violation error according to a comparison result of the segment number contained in the segment base of the virtual address segment and the upper data having the same number as the segment number in the linear address when executed. In this embodiment, since the target program may be allocated to the virtual address segment capable of dynamically adjusting the segment length, the internal fragments in the memory may be reduced as much as possible, and the utilization rate of the memory may be improved.
Optionally, the method further includes: under the condition that a memory allocation program receives a dynamic memory request from the target program, the memory allocation program allocates a virtual address space which is in an idle state and meets the size of a request space contained in the memory request to the target program, and requests an operating system to allocate physical memory for the virtual address space. In this embodiment, the memory allocation space of the target program is limited to a continuous virtual address segment, so that the target program is loaded into the corresponding virtual address segment in the memory by the operating system when the target program needs to be executed, and the target program generally does not occupy all the memory space in the virtual address segment after being loaded into the virtual address segment, for example, when the target program is just loaded into the virtual address segment, the target program only occupies the static memory defined during compilation, and any one specified type of variable applies for the corresponding static memory space during compilation. However, during the execution of the target program, some new memory occupation may be required, for example, for an array with an undefined size, the static memory required for the target program at the time of compiling is uncertain, so that more dynamic memory may be required for the target program dynamically as the target program allows. In this embodiment of the present specification, when a target program needs to apply for an additional dynamic memory, a dynamic memory request needs to be sent to a memory allocation program, where the memory allocation program may be an operating system, or may be a program that is loaded and run in a memory and is specially used for managing memory allocation of the target program, after the memory allocation program receives the memory allocation request, because the memory allocation program knows a virtual address segment allocated to the target program in advance, and can know a memory space currently occupied by the target program and a memory space in a free state in the virtual address segment, the memory allocation program can allocate, in response to the memory allocation request, a virtual address space (dynamic memory) in the virtual address segment, which is in a free state and satisfies a size of a request space included in the memory request, to the target program, and requesting the operating system to allocate physical memory for the virtual address space, so that the target program can acquire and use the dynamic memory allocated to the target program, and meanwhile, the memory allocation program can update the memory space currently occupied by the current target program and the memory space in an idle state in the virtual address field.
Optionally, the modification authority of the code in the code segment and the read-only data in the data segment in the target program is set to be in a read-only mode through a protection mechanism of the page table, so as to prevent the target program from modifying the code or the read-only data during the execution process. In the embodiment of the present specification, a protection mechanism for a memory in page-based storage management may be enabled, for example, a right management mode of a page corresponding to a read-only data portion in a data segment and a code in a code segment in a target program is set to a read-only mode, so that the target program may be prevented from invalidating an address checking machine code generated by compiling by modifying the code or the read-only data, thereby increasing the security of the system.
Fig. 3 is a schematic structural diagram of an apparatus according to an exemplary embodiment. Referring to fig. 3, at the hardware level, the apparatus includes a processor 302, an internal bus 304, a network interface 306, a memory 308, and a non-volatile memory 310, but may also include hardware required for other services. One or more embodiments of the present description may be implemented in software, such as by processor 302 reading a corresponding computer program from non-volatile storage 310 into memory 308 and then executing. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Fig. 4 is a block diagram of a program compiling apparatus for preventing a memory boundary violation according to an exemplary embodiment. Referring to fig. 4, the apparatus may be applied to the device shown in fig. 3, for example, the apparatus includes a soft/hard compiler or a soft/hard interpreter to implement the technical solution of the present specification. Wherein the apparatus comprises:
an address check machine code compiling unit 401, configured to generate an address check machine code for a memory access code in a source program in a process of compiling the source program into a target program, where a memory allocation space corresponding to the target program is located in a continuous virtual address segment, a segment base address of the virtual address segment includes a segment number located at a high level and a plurality of zero-valued logic bits located at a low level, and a segment length of the virtual address segment is not less than a maximum address space that can be represented by the plurality of zero-valued logic bits; the address checking machine code is to:
aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number;
and allowing the target memory space in the memory to be accessed according to the linear address under the condition that the comparison results are the same.
Optionally, the method further includes:
an address computation machine code compiling unit 402, configured to generate an address computation machine code for the memory access code, where the address computation machine code is configured to write the linear address obtained by adding the segment offset and the segment base address carried in the memory access code into a first preset storage space, so that the address check machine code obtains the linear address corresponding to the target memory space from the first preset storage space when executing.
Optionally, the method further includes:
a boundary crossing alarm machine code compiling unit 403, configured to generate a boundary crossing alarm machine code for the memory access code, where the boundary crossing alarm machine code is configured to trigger a boundary crossing alarm mechanism if a comparison result between high-order data of the linear address and a segment number included in the segment base address is different.
Optionally, the boundary crossing warning mechanism at least includes stopping running the target program and/or outputting error information to indicate a command position where the memory boundary crossing occurs.
Optionally, the address checking machine code compiling unit 401 is specifically configured to:
compiling the source program into the target program in the process of interpreting and executing the source program; alternatively, the first and second electrodes may be,
compiling the source program AOT into the target program.
Optionally, the segment number or the segment base address is pre-stored in a second preset storage space, so that the address checking machine code obtains the segment number included in the segment base address from the second preset storage space during execution.
Optionally, the second preset storage space includes any one of the following: a register, a predefined memory space in the virtual address segment, or a predefined memory space outside the virtual address segment.
Optionally, the second preset storage space only participates in the compiling process of generating the address checking machine code.
Optionally, the modification authority of the second preset storage space is owned by the operating system.
Optionally, the first address of the target program is aligned with the segment base address of the virtual address segment.
Optionally, the memory access code includes a bytecode.
Optionally, the source program or the target program is an intelligent contract deployed in a blockchain system.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (14)

1. A program compiling method for preventing memory boundary crossing comprises the following steps:
in the process of compiling a source program into a target program, generating an address check machine code aiming at a memory access code in the source program, wherein a memory allocation space corresponding to the target program is positioned in a section of continuous virtual address section, a section base address of the virtual address section comprises a section number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the section length of the virtual address section is not less than the maximum address space which can be represented by the plurality of zero-value logic bits; the address checking machine code is to:
aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number;
and allowing the target memory space in the memory to be accessed according to the linear address under the condition that the comparison results are the same.
2. The method of claim 1, further comprising:
and generating an address calculation machine code aiming at the memory access code, wherein the address calculation machine code is used for writing the linear address obtained by adding the segment offset and the segment base address carried in the memory access code into a first preset storage space, so that the address check machine code obtains the linear address corresponding to the target memory space from the first preset storage space during execution.
3. The method of claim 1, further comprising:
and generating a boundary crossing alarm machine code aiming at the memory access code, wherein the boundary crossing alarm machine code is used for triggering a boundary crossing alarm mechanism under the condition that the comparison result of the high-order data of the linear address and the segment number contained in the segment base address is different.
4. The method of claim 3, the out-of-range alert mechanism comprising at least stopping running the target program and/or outputting an error message to indicate a location of an instruction where a memory out-of-range occurred.
5. The method of claim 1, the compiling a source program into a target program comprising:
compiling the source program into the target program in the process of interpreting and executing the source program; alternatively, the first and second electrodes may be,
compiling the source program AOT into the target program.
6. The method of claim 1, wherein the segment number or the segment base address is pre-stored in a second predetermined storage space, so that the address checking machine code retrieves the segment number contained in the segment base address from the second predetermined storage space when executing.
7. The method of claim 6, the second predetermined storage space comprising any of: a register, a predefined memory space in the virtual address segment, or a predefined memory space outside the virtual address segment.
8. The method of claim 6, wherein the second predetermined memory space is only involved in a compilation process for generating the address check machine code.
9. The method of claim 1, the target program's first address being aligned with a segment base address of the virtual address segment.
10. The method of claim 1, the memory access code comprising bytecode.
11. The method of claim 1, the source program or the target program being an intelligent contract deployed in a blockchain system.
12. A program compiling apparatus for preventing memory boundary crossing, comprising:
the system comprises an address check machine code compiling unit, a data processing unit and a data processing unit, wherein the address check machine code compiling unit is used for generating an address check machine code aiming at a memory access code in a source program in the process of compiling the source program into a target program, a memory allocation space corresponding to the target program is positioned in a continuous virtual address section, a section base address of the virtual address section comprises a section number positioned at a high position and a plurality of zero-value logic bits positioned at a low position, and the section length of the virtual address section is not less than the maximum address space which can be represented by the zero-value logic bits; the address checking machine code is to:
aiming at a target memory space to be accessed by the memory access code, comparing high-order data of a linear address corresponding to the target memory space with a segment number contained in the segment base address, wherein the linear address is the sum of a segment offset of the target memory space relative to the segment base address and the segment base address, and the high-order data and the segment number have the same number;
and allowing the target memory space in the memory to be accessed according to the linear address under the condition that the comparison results are the same.
13. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-11 by executing the executable instructions.
14. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1-11.
CN202111033647.0A 2021-09-03 2021-09-03 Program compiling method and device for preventing memory boundary crossing Active CN113672237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111033647.0A CN113672237B (en) 2021-09-03 2021-09-03 Program compiling method and device for preventing memory boundary crossing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111033647.0A CN113672237B (en) 2021-09-03 2021-09-03 Program compiling method and device for preventing memory boundary crossing

Publications (2)

Publication Number Publication Date
CN113672237A true CN113672237A (en) 2021-11-19
CN113672237B CN113672237B (en) 2022-03-11

Family

ID=78548431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111033647.0A Active CN113672237B (en) 2021-09-03 2021-09-03 Program compiling method and device for preventing memory boundary crossing

Country Status (1)

Country Link
CN (1) CN113672237B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422554A (en) * 2022-10-25 2022-12-02 支付宝(杭州)信息技术有限公司 Request processing method, compiling method and trusted computing system
CN117435440A (en) * 2023-12-20 2024-01-23 麒麟软件有限公司 Dynamic analysis method and system for program heap space

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682492A (en) * 2015-11-06 2017-05-17 大唐移动通信设备有限公司 Method and device for managing heap corruption
WO2019231194A1 (en) * 2018-05-28 2019-12-05 삼성전자 주식회사 Method and system for detecting memory error
CN111124921A (en) * 2019-12-25 2020-05-08 北京字节跳动网络技术有限公司 Memory out-of-range detection method, device, equipment and storage medium
CN111338794A (en) * 2020-02-18 2020-06-26 苏州洞察云信息技术有限公司 Memory out-of-range monitoring method and device and storage medium
CN112905998A (en) * 2021-02-26 2021-06-04 中国人民解放军国防科技大学 Address-oriented attack protection method and device based on code segment random switching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682492A (en) * 2015-11-06 2017-05-17 大唐移动通信设备有限公司 Method and device for managing heap corruption
WO2019231194A1 (en) * 2018-05-28 2019-12-05 삼성전자 주식회사 Method and system for detecting memory error
CN111124921A (en) * 2019-12-25 2020-05-08 北京字节跳动网络技术有限公司 Memory out-of-range detection method, device, equipment and storage medium
CN111338794A (en) * 2020-02-18 2020-06-26 苏州洞察云信息技术有限公司 Memory out-of-range monitoring method and device and storage medium
CN112905998A (en) * 2021-02-26 2021-06-04 中国人民解放军国防科技大学 Address-oriented attack protection method and device based on code segment random switching

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
C++初学者555: "堆,栈等概念以及内存泄露,内存溢出,内存越界等问题", 《HTTPS://BLOG.CSDN.NET/QQ_40140790/ARTICLE/DETAILS/100159160》 *
QQ62: "Linux-内存越界", 《HTTPS://BLOG.CSDN.NET/QQ_42139383/ARTICLE/DETAILS/109822884》 *
姬希娜: "Nucleus PLUS的动态内存管理机制研究", 《单片机与嵌入式系统应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422554A (en) * 2022-10-25 2022-12-02 支付宝(杭州)信息技术有限公司 Request processing method, compiling method and trusted computing system
CN117435440A (en) * 2023-12-20 2024-01-23 麒麟软件有限公司 Dynamic analysis method and system for program heap space
CN117435440B (en) * 2023-12-20 2024-04-05 麒麟软件有限公司 Dynamic analysis method and system for program heap space

Also Published As

Publication number Publication date
CN113672237B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN109359487B (en) Extensible security shadow storage and tag management method based on hardware isolation
CN1147775C (en) Guared memory system and method
CN108205502B (en) Lightweight trusted transaction
JP5914145B2 (en) Memory protection circuit, processing device, and memory protection method
CN113672237B (en) Program compiling method and device for preventing memory boundary crossing
CN113468079B (en) Memory access method and device
CN113485716B (en) Program compiling method and device for preventing memory boundary crossing
WO2019180401A1 (en) An apparatus and method for storing bounded pointers
WO2019237866A1 (en) Method for controlling access at runtime and computing device
JP7291149B2 (en) Controlling protection tag checking on memory accesses
KR20200131855A (en) Random tag setting command for tag protected memory system
JP2021532468A (en) A memory protection unit that uses a memory protection table stored in the memory system
JP2021531583A (en) Binary search procedure for control tables stored in memory system
JP2023526811A (en) Tag check device and method
JP2015158936A (en) Data processor
CN114691532A (en) Memory access method, memory address allocation method and device
US20170262382A1 (en) Processing device, information processing apparatus, and control method of processing device
US9639477B2 (en) Memory corruption prevention system
JP7349437B2 (en) Controlling protection tag checking on memory accesses
JP7269942B2 (en) Multiple guard tag setting instructions
US9043612B2 (en) Protecting visible data during computerized process usage
CN115994348A (en) Control method for program pipeline, processing device and storage medium
US11150887B2 (en) Secure code patching
KR20170139547A (en) Fine memory protection to prevent memory overrun attacks
CN117222990A (en) Techniques for access to memory using capability constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant