CN117331611A - Program running method and device - Google Patents

Program running method and device Download PDF

Info

Publication number
CN117331611A
CN117331611A CN202211118557.6A CN202211118557A CN117331611A CN 117331611 A CN117331611 A CN 117331611A CN 202211118557 A CN202211118557 A CN 202211118557A CN 117331611 A CN117331611 A CN 117331611A
Authority
CN
China
Prior art keywords
program
computing resource
instruction
determining
conditional jump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211118557.6A
Other languages
Chinese (zh)
Inventor
许中虎
王淑倩
徐建荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2023/100498 priority Critical patent/WO2023246625A1/en
Publication of CN117331611A publication Critical patent/CN117331611A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A program running method and device are used for solving the problem that the existing processing equipment configures computing resources as default configuration parameters, and the computing resources adopt the default configuration parameters to run a plurality of different types of programs and cannot be suitable for running characteristics of various programs. In this application, the method may be executed by the processing device or a computing resource in the processing device, and specifically, when it is determined that a program running in the computing resource includes multiple similar sub-programs, the running features of the sub-programs that have run or are currently running in the multiple similar sub-programs are obtained, where the multiple similar sub-programs are multiple sub-programs whose similarity of the running features is greater than or equal to a first preset value. The configuration parameters of the computing resource are determined according to the operating characteristics, and the computing resource is further configured by using the configuration parameters of the computing resource.

Description

Program running method and device
The present application claims priority from the chinese patent application filed at 25 of 2022, 6, the intellectual property office of the people's republic of China, application number 202210731665.4, entitled "a program execution method", the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a program running method and apparatus.
Background
The processing device includes computing resources therein that may be used to run a variety of different types of programs, such as a central processing unit (central processing unit, CPU), or cores in a CPU. To achieve versatility in computing resources in a processing device, the processing device may configure the computing resources as default configuration parameters. Taking a CPU as an example, a CPU manufacturer can set the CPU under different configuration parameters in a design and manufacturing stage, and then test the CPU under the different configuration parameters to select the optimal configuration parameters as default configuration parameters of the CPU, so that the default configuration parameters of the CPU can be ensured to be used for the CPU to smoothly run various different types of programs.
However, the computing resources in the processing device employ default configuration parameters to run multiple different types of programs, which cannot be adapted to the running characteristics of the various types of programs, i.e., the computing power of the computing resources cannot be maximized.
Disclosure of Invention
The application provides a program running method and device, which are used for identifying running characteristics of subroutines in a program running in a computing resource, adjusting configuration parameters of the computing resource according to the identified running characteristics, and helping to improve the execution efficiency of the running program of the computing resource and maximize the computing capability of utilizing the computing resource.
In a first aspect, the present application provides a method of running a program, the method being executable by a processing device, or by a computing resource in the processing device, wherein the computing resource may in particular be a processor in the processing device or a core of a processor in the processing device. The method comprises the following steps: when it is determined that a program running on a computing resource in a processing device includes multiple similar sub-programs, the running characteristics of the running or currently running sub-programs in the multiple similar sub-programs are obtained, wherein the multiple similar sub-programs are multiple sub-programs with the similarity of the running characteristics being greater than or equal to a first preset value. And determining configuration parameters of the computing resources according to the operation characteristics, and further using the determined configuration parameters to configure the computing resources.
In the above technical solution, when it is determined that the running program in the computing resource includes multiple similar subroutines, the configuration parameters of the computing resource may be determined according to the running characteristics of the running or current running subroutines, and the configuration parameters of the computing resource are configured in the computing resource, so that the execution efficiency of the computing resource is improved when executing other similar subroutines in the program, and further the execution efficiency of the whole program is improved.
In one possible implementation, the multiple similar sub-programs are loop programs that are executed multiple times in a program, each of the multiple sub-programs corresponding to one or more loops of the loop program. In the technical scheme, the loop program exists in the program operated by the computing resource, and the configuration parameters of the computing resource adopted for operating the loop program are determined based on the operation characteristics of the loop program in one or more loops, so that the computing resource can execute the loop program more efficiently.
In one possible implementation, the number of instructions of the loop program is greater than a second preset value. In this way, more frequent configuration of parameters in the computing resource is avoided, and the computing power consumption (cost) is reduced.
In one possible implementation manner, when it is determined that the program running in the computing resource includes multiple similar subroutines, it may be specifically determined that the first subroutine is a loop program when the number of times that the first subroutine is repeatedly executed in the program is greater than a third preset value. Thus, a method for determining that a loop program is included in a program is provided, and the accuracy of recognizing the loop program is improved.
In one possible implementation, after the execution of the first subroutine and the execution of the conditional jump instruction, determining whether there is record information of the conditional jump instruction; if the record information of the conditional jump instruction exists, determining whether the number of times the first subprogram is repeatedly executed is larger than a third preset value according to the record information of the conditional jump instruction; if it is determined that the record information of the conditional jump instruction does not exist, the record information of the conditional jump instruction is added. In the technical scheme, the conditional jump instruction is executed after the first subprogram is executed, and whether the first subprogram is a loop program or not is determined according to the record information of the conditional jump instruction, so that the accuracy of identifying the loop program is improved.
In one possible implementation, after executing the conditional jump instruction, it is first determined whether the conditional jump instruction points to a small loop, where a small loop refers to a loop in which the number of instructions executed in a single loop is less than a second preset value. If the conditional jump instruction points to a small loop, filtering the conditional jump instruction; if the conditional jump instruction does not point to a loop, it is further determined whether there is record information for the conditional jump instruction.
In one possible implementation, the record information is recorded in a predetermined buffer of the computing resource. Thus, the recorded information can be read quickly from the cache, and the speed of recognizing the loop program can be increased.
In one possible implementation manner, it is determined whether the record information of the conditional jump instruction exists, specifically, the jump identifier is determined according to the identification information of the conditional jump instruction, where the jump identifier is the identification information of the conditional jump instruction or is a hash of the identification information of the conditional jump instruction. Traversing a plurality of pieces of record information in the cache, and if a certain piece of record information comprises the jump identifier, determining the record information comprising the conditional jump instruction in the cache; if the jump identifier is not included in the pieces of record information in the cache, the record information of the conditional jump instruction is not included in the cache.
In one possible implementation, the identification information of the conditional jump instruction is obtained from a branch recording module. The identification information includes a start position and/or a target position, or includes a hash of the start position and/or the target position.
In one possible implementation manner, when determining, according to the record information, whether the number of times the first sub-program is repeatedly executed is greater than a third preset value, specifically, may determine, according to the number of instructions and the execution length in the record information, whether the number of times the first sub-program is repeatedly executed is greater than the third preset value, where the number of instructions in the record information is used to indicate the number of instructions that the program cumulatively executed when the program executes the conditional jump instruction last time; the execution length in the record information is the difference of the number of instructions which are executed by the program in an accumulated way when the program executes the conditional jump instruction in the previous two times respectively.
According to the technical scheme, whether the first subprogram is a loop program or not is determined according to the instruction number and the execution length in the recorded information of the conditional jump instruction, so that the accuracy of identifying the loop program is improved.
In one possible implementation manner, when the third preset value is equal to 2 and it is determined that the number of times the first subroutine is repeatedly executed is greater than the third preset value according to the number of instructions and the execution length in the record information, specifically, the number of instructions executed by the program when executing the conditional jump instruction may be taken as the first number of instructions, and a difference between the first number of instructions and the number of instructions in the record information may be taken as the first execution length. If the execution length in the recorded information is not 0 and the difference between the first execution length and the execution length in the recorded information is smaller than the difference threshold, it is determined that the number of times the first subroutine is repeatedly executed is greater than 2. The technical scheme is beneficial to improving the accuracy of the recognition cycle program.
In one possible implementation manner, after determining that the number of times the first subroutine is repeatedly executed is greater than the third preset value, the method further includes: the record information is updated according to the first instruction number and the first execution length.
In one possible implementation manner, after determining that the first subroutine is a loop program, the instruction number threshold is further determined according to the first instruction number in the record information after updating and the preset execution length, and by way of example, the sum of the first instruction number and the preset execution length is taken as the instruction number threshold, where the preset execution length is the preset execution length, or the first execution length, or the execution length in the record information before updating, or the like. When the instruction number of the program execution instruction reaches the instruction number threshold, if the conditional jump instruction is not executed again, determining that the loop program has been exited; the configuration parameters of the computing resources are determined as default configuration parameters. Therefore, whether the program exits the loop can be accurately identified, and the configuration parameters of the computing resource are timely adjusted to default configuration parameters.
In one possible implementation manner, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than the third preset value, specifically, whether the number of times the conditional jump instruction is executed is greater than the third preset value may also be determined according to the number of times the conditional jump instruction is executed in the record information. In one possible implementation manner, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value, the number of times the conditional jump instruction is executed in the record information may be updated first; and determining whether the number of times the first subprogram is repeatedly executed is greater than a third preset value according to the number of times the conditional jump instruction is executed after updating. Specifically, when the number of execution times of the conditional jump instruction after the update is greater than a third preset value, it is determined whether the number of times the first subroutine is repeatedly executed is greater than the third preset value.
In one possible implementation manner, when determining the operation feature in one cycle of the operation of the first subprogram by the computing resource, the acquiring program may specifically acquire the operation feature of the operation program by the computing resource when executing the conditional jump instruction; and acquiring the operation characteristics of the resource operation program when the conditional jump instruction is executed once before the conditional jump instruction is executed; and determining the operation characteristics in one cycle of the operation of the computing resource on the first subprogram according to the operation characteristics acquired twice. Wherein the operating characteristic is determined from a characteristic count value obtained from a characteristic counter of the performance monitoring unit.
In one possible implementation manner, when determining the configuration parameters of the computing resource according to the operation characteristics, specifically, a target preset characteristic matched with the operation characteristics is determined from a plurality of preset characteristics according to the operation characteristics; and determining preset configuration parameters corresponding to the target preset features as configuration parameters of the computing resources. Illustratively, the configuration parameters of the computing resources include a prefetch policy, including a prefetch policy of a lost cache line, a prefetch policy of integer data access, a aggressiveness of a prefetch algorithm, and the like. In the technical scheme, after the running characteristics of the running or running subprogram are determined, the configuration parameters corresponding to the running characteristics of the subprogram are obtained, so that the configuration parameters of the computing resources suitable for the running characteristics of the subprogram required to be run in the future are obtained, and the execution efficiency of the program is improved.
In one possible implementation manner, when determining the target preset feature matched with the operation feature from the plurality of preset features, specifically, performing dimension reduction processing on the operation feature of the dimension a to obtain the operation feature of the dimension B; according to the matching degree between the operation features of the B dimension and the preset features of the B dimensions, selecting the preset feature with the highest matching degree from the preset features of the B dimensions as a target preset feature, wherein A, B is a positive integer, and B is smaller than A.
In one possible implementation manner, when determining the matching degree between the operation feature of the B dimension and the preset feature of any B dimension, specifically, for any one dimension of the B dimension, determining the matching degree between the operation feature corresponding to the dimension and the preset feature; and determining the matching degree between the operation characteristics of the B dimension and the preset characteristics of the B dimension according to the corresponding matching degree of each dimension in the B dimension. In one possible implementation, in one dimension of the preset features of the B dimension, the dimension includes a plurality of bits, and a value of a bit in the plurality of bits is masked by a mask. When determining the matching degree between the operation feature corresponding to the dimension and the preset feature for the dimension, the matching degree can be determined based on a fuzzy matching mode.
In one possible implementation, if a target preset feature matching the running feature is not determined from the plurality of preset features, the default configuration parameter is taken as the configuration parameter of the computing resource.
In one possible implementation, the configuration parameters of the computing resource include an address of a configuration register in the computing resource and a configuration register value; when the configuration parameters of the computing resource are configured to the computing resource, the configuration register value may be written into the configuration register corresponding to the address of the configuration register. In this way, the flexibility of parameter configuration is facilitated to be improved.
In one possible implementation, the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer, the cache miss rate, the prefetch hit rate.
In a second aspect, the present application provides an apparatus for executing a program, where the apparatus for executing a program may be a processing device, or may be a computing resource in the processing device, where the computing resource may specifically be a processor in the processing device or a core of a processor in the processing device.
The device for running the program comprises a parameter determining module and a configuration module.
And the parameter determining module is used for acquiring the running characteristics of the running or currently running subprograms in the multi-section similar subprograms when the programs in which the computing resources in the processing equipment are running are determined to comprise the multi-section similar subprograms, wherein the multi-section similar subprograms are multi-section subprograms with the similarity of the running characteristics being greater than or equal to a first preset value. And determining configuration parameters of the computing resources according to the operation characteristics.
And the configuration module is used for configuring the computing resource by using the determined configuration parameters.
In one possible implementation, the multiple similar sub-programs are loop programs that are executed multiple times in a program, each of the multiple sub-programs corresponding to one or more loops of the loop program.
In one possible implementation, the number of instructions of the loop program is greater than a second preset value.
In one possible implementation, the apparatus further includes a detection module, where the detection module is configured to determine that the running program in the computing resource includes multiple similar sub-programs. Specifically, when the detection module determines that the number of times that the first subprogram is repeatedly executed in the program is greater than a third preset value, the detection module determines that the first subprogram is a loop program.
In one possible implementation manner, the detection module is specifically configured to, when determining whether the number of times the first subroutine in the program is repeatedly executed is greater than a third preset value: after the execution of the first subroutine and the execution of the conditional jump instruction, determining whether there is record information of the conditional jump instruction; if the record information of the conditional jump instruction exists, determining whether the number of times the first subprogram is repeatedly executed is larger than a third preset value according to the record information of the conditional jump instruction; if it is determined that the record information of the conditional jump instruction does not exist, the record information of the conditional jump instruction is added.
In one possible implementation, after the detection module acquires the conditional jump instruction, the method is further used for: it is determined whether the conditional jump instruction points to a small loop, where a small loop refers to a loop in which the number of instructions executed in a single loop is less than a second preset value. If the conditional jump instruction points to a small loop, filtering the conditional jump instruction; if the conditional jump instruction does not point to a loop, it is further determined whether there is record information for the conditional jump instruction.
In one possible implementation, the record information is recorded in a predetermined section of a cache of the computing resource.
In one possible implementation manner, the detection module is specifically configured to, when determining whether there is record information of the conditional jump instruction: and determining a jump identifier according to the identification information of the conditional jump instruction, wherein the jump identifier is the identification information of the conditional jump instruction or the hash of the identification information of the conditional jump instruction. Traversing a plurality of pieces of record information in the cache, and if a certain piece of record information comprises the jump identifier, determining the record information comprising the conditional jump instruction in the cache; if the jump identifier is not included in the pieces of record information in the cache, the record information of the conditional jump instruction is not included in the cache. In one possible implementation, the identification information of the conditional jump instruction is obtained from a branch recording module. The identification information includes a start position and/or a target position, or includes a hash of the start position and/or the target position.
In one possible implementation manner, the detection module is specifically configured to, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value: determining whether the number of times that the first subprogram is repeatedly executed is larger than a third preset value according to the instruction number and the execution length in the recorded information, wherein the instruction number in the recorded information is used for indicating the instruction number which is executed by the program in a cumulative way when the program executes the conditional jump instruction last time; the execution length in the record information is the difference of the number of instructions which are executed by the program in an accumulated way when the program executes the conditional jump instruction in the previous two times respectively.
In one possible implementation manner, the third preset value is equal to 2, and the detection module is specifically configured to, when determining, according to the number of instructions and the execution length in the record information, that the number of times the first subroutine is repeatedly executed is greater than the third preset value: the instruction number of program execution when executing the conditional jump instruction is taken as a first instruction number, and the difference between the first instruction number and the instruction number in the record information is taken as a first execution length. If it is determined that the execution length in the recorded information is not 0 and the difference between the first execution length and the execution length in the recorded information is smaller than the difference threshold, it is determined that the number of times the first subroutine is repeatedly executed is greater than 2. In one possible implementation, the detection module is further configured to, after determining that the number of times the first subroutine is repeatedly executed is greater than a third preset value: the record information is updated according to the first instruction number and the first execution length.
In one possible implementation, the detection module, after determining that the first subroutine is a loop program, is further configured to: and determining an instruction number threshold according to the first instruction number and the preset execution length in the updated record information. When the instruction number of the program execution instruction reaches the instruction number threshold, if the conditional jump instruction is not executed again, determining that the loop program has been exited; the parameter determination module is further configured to determine a configuration parameter of the computing resource as a default configuration parameter.
In one possible implementation manner, the detection module is specifically configured to, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value: and determining whether the execution times of the conditional jump instruction are larger than a third preset value according to the execution times of the conditional jump instruction in the record information. And when the execution times of the conditional jump instruction are determined to be larger than a third preset value, namely, the times that the first subprogram is repeatedly executed are determined to be larger than the third preset value.
In one possible implementation manner, the detection module is specifically configured to, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value: updating the execution times of the conditional jump instruction in the record information; and determining whether the execution times of the conditional jump instruction are larger than a third preset value according to the execution times of the conditional jump instruction after updating.
In one possible implementation manner, the parameter determining module is specifically configured to, when determining the operation feature in one cycle of the first subroutine operated by the computing resource, obtain the operation feature of the computing resource operation program when executing the conditional jump instruction; and acquiring the operation characteristics of the resource operation program when the program executes the conditional jump instruction for one time before executing the conditional jump instruction; and determining the operation characteristics in one cycle of the operation of the computing resource on the first subprogram according to the operation characteristics acquired twice. Wherein the operating characteristic is determined by the parameter determination module based on a characteristic count value obtained from a characteristic counter of the performance monitoring unit.
In one possible implementation manner, the parameter determining module is specifically configured to determine, when determining the configuration parameter of the computing resource according to the operation feature, a target preset feature matched with the operation feature from a plurality of preset features; and determining preset configuration parameters corresponding to the target preset features as configuration parameters of the computing resources. Illustratively, the configuration parameters of the computing resources include a prefetch policy, including a prefetch policy of a lost cache line, a prefetch policy of integer data access, a aggressiveness of a prefetch algorithm, and the like.
In one possible implementation manner, when determining a target preset feature matched with an operation feature from a plurality of preset features, the parameter determining module is specifically configured to perform a dimension reduction process on the operation feature of the dimension a to obtain an operation feature of the dimension B; according to the matching degree between the operation features of the B dimension and the preset features of the B dimensions, selecting the preset feature with the highest matching degree from the preset features of the B dimensions as a target preset feature, wherein A, B is a positive integer, and B is smaller than A.
In one possible implementation manner, when the parameter determining module determines the matching degree between the operation feature of the B dimension and the preset feature of any B dimension, the parameter determining module is specifically configured to determine, for any one dimension of the B dimensions, the matching degree between the operation feature corresponding to the dimension and the preset feature; and determining the matching degree between the operation characteristics of the B dimension and the preset characteristics of the B dimension according to the corresponding matching degree of each dimension in the B dimension. In one possible implementation, in one dimension of the preset features of the B dimension, the dimension includes a plurality of bits, and a value of a bit in the plurality of bits is masked by a mask. When determining the matching degree between the operation feature corresponding to the dimension and the preset feature for the dimension, the parameter determination module may determine based on a fuzzy matching manner.
In one possible implementation, if the parameter determination module does not determine a target preset feature that matches the running feature from the plurality of preset features, the default configuration parameter is taken as the configuration parameter of the computing resource.
In one possible implementation, the configuration parameters of the computing resource include an address of a configuration register in the computing resource and a configuration register value; the configuration module is specifically configured to write the configuration register value into the configuration register corresponding to the address of the configuration register when the configuration parameter of the computing resource is configured to the computing resource.
In one possible implementation, the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer, the cache miss rate, the prefetch hit rate.
In a third aspect, the present application provides a processing device comprising a computing resource and a memory connected to the computing resource, the memory for storing a computer program, the computing resource for executing the computer program stored in the memory, such that the computing resource implements the method of the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program or instructions which, when executed by a computing resource in a processing device, implement the method of the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the present application provides a processing chip comprising at least one processor core and an interface; an interface for providing program instructions or data to at least one processor core; at least one processor core is configured to execute program line instructions to implement the method performed by the computing resource in the first aspect or any one of the possible implementations of the first aspect.
The technical effects achieved by any one of the second to fifth aspects may be referred to the description of the beneficial effects in the first aspect, and the detailed description is not repeated here.
Drawings
FIG. 1 is a schematic diagram of a processing apparatus;
FIG. 2 is a schematic diagram of the internal structure of a CPU core;
FIG. 3 is a schematic flow chart of a program running method provided in the present application;
FIG. 4 is a schematic flow chart of a determining loop procedure provided in the present application;
FIG. 5 is a flow chart of a specific implementation of a determining loop procedure provided herein;
FIG. 6 is a flow chart of another embodiment of a loop determination procedure provided herein;
FIG. 7 is a schematic diagram of a correspondence between preset features and preset configuration parameters provided in the present application;
fig. 8 is a schematic structural diagram of a device for program running provided in the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 provides a schematic structural diagram of one possible processing apparatus 10.
The processing device 10 comprises a processor 101, a memory 102 and a communication interface 103. Any two of the processor 101, the memory 102, and the communication interface 103 may be connected via a bus 104.
The processor 101 may be a CPU that may be used to execute instructions in the memory 102 to perform one or more functions, such as determining whether a program is in loop (or referred to as loop state). Processor 101 may be, in addition to a CPU, an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA), a system on chip (SoC) or complex programmable logic device (complex programmable logic device, CPLD), a graphics processor (graphics processing unit, GPU), a neural-network accelerator (NPU), or the like.
In practical applications, the number of the processors 101 may be plural, and the plural processors 101 may include plural processors of the same type, or may include plural processors of different types, for example, the plural processors 101 are plural CPUs. For another example, the plurality of processors 101 includes one or more CPUs and one or more GPUs. For another example, the plurality of processors 101 may include one or more CPUs and one or more NPUs. Alternatively, the plurality of processors 101 may include one or more CPUs, one or more GPUs, one or more NPUs, and the like.
The processor 101 (e.g., CPU, NPU, etc.) may include one physical core/processor (or multiple physical cores). Wherein the physical core is a real processor core visible inside the processor. For convenience of description, the physical core of the processor may be simply referred to as a processor core.
Taking one physical core in the CPU (simply referred to as CPU core) as an example. Fig. 2 is a schematic diagram of an internal structure of a CPU core 20 provided in an exemplary manner of the present application, where the CPU core 20 includes a micro-ops/uops (micro-ops) module 201, a branch recording module 202, a performance monitoring module 203, and a register 204.
Of course, other modules not shown in fig. 2 may also be included in the CPU core 20.
The micro instruction module 201 is configured to detect and store a micro instruction cycle sequence, and when the micro instruction cycle sequence is less than or equal to the capacity of the micro instruction module 201, the micro instruction cycle sequence can be stored in the micro instruction module 201, so that a corresponding micro instruction sequence can be obtained without decoding at the front end, and only the corresponding micro instruction sequence needs to be continuously fetched from the micro instruction module 201. The micro instruction module 201 is, for example, a loop instruction stream detector (loop stream detector, LSD).
The branch recording module 202 is configured to record a branch jump executed by the CPU core 20 last or more times, for example, when the CPU core 20 executes the jump instruction 2 and jumps from the instruction 2 to the instruction 11, the branch recording module 202 may record a start position and a target position of the branch jump, that is, the start position is the instruction 2 and the target position is the instruction 11. The branch record module 202 is, for example, a last branch record (last brach recording, LBR) module.
The performance monitoring module 203, including one or more counters, is capable of tracking and counting some underlying hardware events, such as events related to the CPU core 20 (number of executed instructions, capture exception, number of clock cycles, etc.), events related to cache (cache) (number of L1/L2 cache accesses, number of misses (miss), etc.), and events related to translation lookaside buffers (translation lookaside buffer, TLB), etc. These events reflect the behavior of the program execution period and can be used to analyze and tune the program. The performance monitoring module 203 is, for example, a performance monitoring unit (performance monitoring unit, PMU).
Register 204 is a high-speed memory device of limited memory capacity that can be used to temporarily store instructions, data, and addresses. In this application, the register may be specifically a register for defining CPU behavior, and for convenience of understanding, the register for defining CPU behavior may be referred to as a configuration register.
The memory 102 is a device for storing data, and may be a memory or a hard disk.
Memory refers to an internal memory that exchanges data directly with the processor 101, which can read and write data at any time and at a high speed, as a temporary data memory for an operating system or other running program running on the processor 101. The memory includes volatile memory (RAM), such as random access memory (random access memory), dynamic random access memory (dynamic random access memory), and the like, and may also include nonvolatile memory (non-volatile memory), such as storage class memory (storage class memory, SCM), and the like, or a combination of volatile memory and nonvolatile memory, and the like. In practical applications, a plurality of memories may be configured in the processing device 10, and optionally, the plurality of memories may be of different types. The number and type of the memories are not limited in this embodiment. In addition, the memory can be configured to have a power-saving function. The power-saving function means that the data stored in the memory cannot be lost when the system is powered down and powered up again. The memory having the power-saving function is called a nonvolatile memory.
Hard disk, for providing storage resources, for example, data for storing programs, such as data of pictures, video, audio, text, etc. Hard disks include, but are not limited to: a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a Hard Disk Drive (HDD), or a Solid State Drive (SSD), or the like. The difference from the memory is that the hard disk has a slower read-write speed, and is generally used for storing data permanently. In one embodiment, data, program instructions, etc. in the hard disk are loaded into the memory before the processor retrieves the data and/or program instructions from the memory.
A communication interface 103 for communicating with other devices.
Typically, the processing device configures the computing resources in the processing device, such as the processor or cores in the processor, to default configuration parameters, which refer to parameters that the computing resources employ when running the program. For example, the configuration parameters include a prefetch policy, including a prefetch policy of a lost cache line (missing cache line), a prefetch policy of integer data access (integer data access), a aggressiveness of a prefetch algorithm (such as a passive (passive) policy or an active (active) policy), etc.
Each type of program has its own operating characteristics (or behavior characteristics). Illustratively, the operating characteristics of the program include computational characteristics of the program, memory access characteristics of the program, and the like. By way of example, the operating characteristics of a program may be represented by a variety of microarchitectural characteristics including, for example, one or more of the following: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer (instruction translation lookaside buffer, ilb), the cache miss rate, the prefetch hit rate, the miss rate of the data translation look-aside buffer (data translation lookaside buffer, dTLB), etc. It is noted that the iTLB may also be referred to as an instruction list cache, an instruction-to-address bypass cache, an address translation cache, and the like. dTLB may also be referred to as a data list cache, a data-to-address bypass cache, a data translation cache, and the like.
It should also be noted that the above-described operating characteristics are merely exemplary, and in other cases, the operating characteristics may include other operating characteristics in addition to one or more of the number of instructions executed by the processor core per clock cycle, the miss rate of the iTLB, the cache miss rate, the prefetch hit rate, the miss rate of the dTLB; alternatively, the run-out characteristics do not include the number of instructions executed by the processor core per clock cycle, the miss rate of the iTLB, the cache miss rate, the prefetch hit rate, the miss rate of the dTLB, but include other run-out characteristics. The present application is not limited to specific operating characteristics.
Because each type of program has respective running characteristics, when the computing resource adopts default configuration parameters to run each type of program, the smooth running of each type of program can be ensured, but the program can not be efficiently run through reasonable configuration parameters based on the running characteristics of each type of program.
To this end, the present application provides a program running method that is executed by a processing device or a computing resource in the processing device. Taking the execution of the computing resource as an example, a program is operated in the computing resource, the computing resource acquires the operation characteristics of the subprogram in the program, and the configuration parameters of the computing resource are adjusted according to the acquired operation characteristics, so that the computing resource can efficiently operate the subprogram with similar operation characteristics as the subprogram in the program.
Fig. 3 is a flow chart of a program running method provided in an exemplary manner, and is explained below with reference to fig. 3:
in step 301, the computing resource determines that the program being run in the computing resource includes multiple similar sub-programs. The multi-section similar subprogram is a multi-section subprogram with the similarity of the operation characteristics being greater than or equal to a first preset value.
The following exemplary provides a representation of a program including multiple similar sub-programs:
In expression 1, the multiple similar sub-program is multiple similar program segments, that is, the multiple similar program segments have similar operation characteristics, and specifically, the similarity of the operation characteristics of the multiple similar program segments is greater than or equal to a first preset value.
The processing device or the computing resource performs static analysis on the multi-section program segments in advance to obtain static analysis results corresponding to the multi-section program segments, where each static analysis result may include an operation feature of each section program segment. The processing device or computing resource then determines that the multi-segment program segment has similar operating characteristics based on the operating characteristics of each of the multi-segment program segment. Correspondingly, in the process of running the program, the computing resource can directly determine the program fragments comprising the multiple similar sections in the program according to the pre-analysis result.
Further exemplary, the processing device or the computing resource runs the multiple program segments in advance, and obtains the running characteristic corresponding to each program segment, so as to determine that the multiple program segments have similar running characteristics according to the running characteristics of the multiple program segments. Correspondingly, in the process of re-running the program, the computing resource can directly determine the program fragments comprising the multiple similar sections in the program according to the pre-analysis result.
In expression 2, the multiple similar sub-programs are loop programs executed multiple times in the program, where each of the multiple sub-programs corresponds to one or more loops of the loop program, that is, the similarity of the running characteristics of the loop program in the multiple loops is greater than or equal to the first preset value. The running program corresponds to an instruction stream, and the instruction stream includes instruction stream fragments (i.e., subroutines) corresponding to the loop program in a plurality of loops, wherein the similarity of the running characteristics of the plurality of instruction stream fragments is greater than or equal to a first preset value.
Accordingly, in the process of executing the program, if the computing resource determines that the program includes a loop program that is executed in a loop, it is determined that the program that the computing resource is running includes multiple similar sub-programs. The implementation manner of determining whether the loop program (recorded as the first subprogram) needs to be executed in a loop exists in the computing resource determining program can be specifically described in the following related embodiments of fig. 4 to 6.
In step 302, the computing resource obtains the operating characteristics of the sub-program that has been or is currently running in the multiple similar sub-programs.
Since the similarity of the operation features of the multiple similar subroutines is greater than or equal to the first preset value, the computing resource may obtain the operation feature of the running or currently running subroutine from the multiple similar subroutines, and use the determined operation feature as the operation feature of the multiple similar subroutines.
In an exemplary case where the multi-segment similar subroutine is a multi-segment similar program segment, the computing resource obtains the running characteristics of the running or currently running program segment of the multi-segment similar program segment, and uses the running characteristics of the running or currently running program segment as the running characteristics of the program segment that has not yet been run in the multi-segment similar program segment. For example, the plurality of similar program segments are program segment 1 to program segment 10, the computing resource runs complete program segment 1 and program segment 2 and has not run program segment 3 to program segment 10, the computing resource determines the running characteristics of program segment 1 and program segment 2, and the determined running characteristics are taken as the running characteristics of program segment 3 to program segment 10.
For example, in the case where the multiple similar sub-programs are loop programs executed by the computing resource multiple times in the program, the computing resource acquires an operation feature in a loop process that has been executed or is currently running, and uses the determined operation feature as an operation feature in a loop process that has not yet been executed. For example, the number of loops of the execution of the subroutine by the computing resource is 10, the computing resource determines the running characteristics of the subroutine in the 1 st loop already running or in the 2 nd loop currently running in the process of circularly running the subroutine, and the determined running characteristics are taken as the running characteristics of the subroutine in the processes of the 3 rd to 10 th loops.
The implementation of computing resource acquisition run features is explained as follows. For each feature counter, the computing resource determines feature count values corresponding to the feature counter before and after execution of the subroutine, and further determines the feature count value corresponding to the subroutine. The computing resource composes a plurality of feature count values corresponding to the subroutine determined by the feature counters into the running feature of the subroutine.
Illustratively, the computing resource includes 3 feature counters (denoted as feature counter 1 to feature counter 3), wherein feature counter 1 is used for recording the number of instructions executed in each clock cycle of the processor core, feature counter 2 is used for recording the cache miss count, and feature counter 3 is used for recording the miss count of the instruction conversion backup buffer.
Example 1, when the computing resource starts to execute the subroutine, the feature count values of the feature counter 1 to the feature counter 3 are all 0; when the execution of the computing resource completes the subroutine, the feature count values of the feature counter 1 to the feature counter 3 are 100, 200, 20, respectively, and then 100, 200, 20 are included in the running features of the subroutine.
Example 2, at the start of execution of the subroutine by the computing resource, the feature count values of the feature counter 1 to the feature counter 3 are 100, 200, 20, respectively; when the execution of the computing resource completes the subroutine, the feature count values of the feature counter 1 to the feature counter 3 are 201, 403, 40, respectively, and then 101, 203, 20 are included in the running features of the subroutine.
In step 303, the computing resource determines configuration parameters of the computing resource according to the determined operational characteristics.
The configuration parameters of the computing resource, i.e., the parameters that need to be configured in the computing resource, may be referred to herein as target configuration parameters of the computing resource. Or it may be further understood that the configuration parameter of the computing resource is the configuration parameter that is most suitable for the running feature of the currently running program, and when the configuration parameter is adopted by the computing resource, the running effect of the program is optimal, so that the configuration parameter of the computing resource may be also referred to as the optimal configuration parameter of the computing resource.
In example 1, a plurality of preset features are preset in the computing resource, and preset configuration parameters corresponding to the plurality of preset features respectively. For example, preset features 1 to 1000 are preset in the computing resource, and preset configuration parameters 1 to 1000 corresponding to the preset features 1 to 1000 respectively. The computing resource determines target preset features matched with the operation features from the plurality of preset features, and takes preset configuration parameters corresponding to the target preset features as target configuration parameters of the computing resource. In combination with the above example, when the computing resource determines that the target preset feature matching the running feature is the preset feature 10 from the preset features 1 to 1000, the computing resource takes the preset configuration parameter 10 as the target configuration parameter of the computing resource.
In example 2, a plurality of preset features are preset in the computing resource, and addresses of preset configuration parameters corresponding to the plurality of preset features respectively. Illustratively, the computing resource stores a plurality of preset features and addresses of preset configuration parameters corresponding to the plurality of preset features, respectively, via a ternary content addressable memory (ternary content addressable memory, TCAM). For example, preset features 1 to 1000 are preset in the computing resource, and addresses 1 to 1000 of preset configuration parameters corresponding to the preset features 1 to 1000 respectively. The computing resource determines a target preset feature matched with the operation feature from a plurality of preset features, reads preset configuration parameters from addresses of preset configuration parameters corresponding to the target preset feature, and takes the read preset configuration parameters as target configuration parameters of the computing resource. In combination with the above example, when the computing resource determines from the preset features 1 to 1000 that the target preset feature matching the running feature is the preset feature 10, the computing resource reads the preset configuration parameter from the address 10, and uses the read preset configuration parameter as the target configuration parameter of the computing resource.
It will be appreciated that the plurality of preset features in example 2 also correspond to respective preset configuration parameters.
Further, the corresponding relation between the preset features and the preset configuration parameters can be obtained through machine learning. By way of example, the computing resource samples the running characteristics of the program by running a large number of programs, performs automatic optimization of performance for the program under each running characteristic to obtain an optimal configuration parameter of the computing resource under the running characteristic, and presets the running characteristic and the optimal configuration parameter as preset characteristics and preset configuration parameters in the processing device. In addition, the processing device can cluster the operation features and the optimal configuration parameters corresponding to the operation features, so that the cost of the storage space is reduced when the processing device stores the operation features and the optimal configuration parameters corresponding to the operation features through the storage space.
Or, the corresponding relation between the preset features and the preset configuration parameters in the processing device can be set based on expert experience.
The following is illustrated by machine learning: the operating characteristics of a large number of different programs and the corresponding optimal configuration parameters are clustered, wherein the optimal configuration parameters can be obtained through an artificial intelligence (artificial intelligence, AI) optimizing algorithm, such as Bayesian optimization. Of which one class The running characteristic of the program is every 2 10 The instruction L3 prefetch count range is 10100000 ~ 10110000, the L3 miss count range is 1100000 ~ 1101111, the address of the preset configuration parameter is 0x1000101, the address points to the address and the value of the configuration register, wherein the address of the configuration register is HHA prefetch, the value of the configuration register is 0 (indicating the functional off), and the specific storage form of the correspondence is shown in fig. 7. HHA is a component that maintains cache coherency between L3 and memory, and the HHA prefetch function prefetches data into L3.
Of course, the example in fig. 7 may also be set by expert experience, specifically, the L3 prefetch count is higher and the L3 miss rate is higher, based on expert experience, it may be considered that the invalid prefetch is more, and closing the HHA prefetch may alleviate the access bandwidth pressure, which is helpful for performance improvement.
In one possible practical application scenario, the processing device stores addresses of preset features and preset configuration parameters in a ternary content addressable memory, and stores a plurality of preset configuration parameters pointed to by a plurality of addresses in a memory, a cache (register), or other cache of the processing device.
Optionally, when the computing resource determines the target preset feature matched with the running feature from the plurality of preset features, the method specifically includes the following steps 1 to 3:
Step 1, normalization processing:
for example, the number of instructions in the preset feature is 1000 and the number of instructions in the running feature is 2000, then the computing resource may normalize the number of instructions in the running feature to 1000. Further, assuming that the running characteristic includes an instruction number of 2000 and the number of times of L2cache miss is 400, the running characteristic after normalization includes: the instruction number is 1000, and the number of times of L2cache miss is 200.
For another example, the number of instructions in the plurality of predetermined features is 2 8 While the number of instructions in the run feature is 2 10 Then the computing resource may normalize the number of instructions in the run feature by right shifting the count by 2 bits to obtain the number of instructions 2 8 . Further, the computing resource shifts the other parameters in the run feature by 2 bits to the right。
Step 2, dimension reduction treatment:
the computing resource comprises A feature counters, and corresponding A feature count values corresponding to the A feature counters respectively form preset features of A dimension. In order to reduce the storage cost, the preset features of the A dimension can be subjected to dimension reduction processing to obtain preset features of the B dimension, wherein A, B is a positive integer, and B is smaller than A.
Further, after the computing resource obtains the operation feature of the A dimension, the operation feature of the A dimension can be reduced to obtain the operation feature of the B dimension.
Step 3, feature matching:
the computing resource determines the matching degree between the operation characteristics of the B dimension and the preset characteristics of a plurality of B dimensions respectively, and then selects the preset characteristic with the highest matching degree from the preset characteristics of the plurality of B dimensions as the target preset characteristic.
In one possible manner, for any one of the dimensions in any one of the B dimensions of the preset feature, the value of the dimension may be represented by a binary representation, i.e., the dimension includes a plurality of bits, each bit of the plurality of bits corresponding to a respective value, e.g., 10111 01100 (i.e., 10 bits) in the dimension.
To further reduce the capacity (or cost) of the storage space used to store the preset features, and to increase the feature matching efficiency, the computing resource may perform the following for each of the multiple dimensions: the computing resource masks out a portion of the bits in the dimension by masking, in combination with the above example, based on mask 11111 11000, masks out the last 3 bits in 10111 01100, resulting in the value of the dimension after masking, 10111 01.
In this way, when the computing resource determines the matching degree of the B dimensions in the running feature and the B dimensions in the preset feature, the matching degree of the running feature and the preset feature in the dimension can be determined for each dimension in the B dimensions, and then the matching degree between the B dimensions in the running feature and the B dimensions in the preset feature is determined according to the matching degree of the running feature and the preset feature in each dimension.
Further, in implementations in which the degree of matching of the run feature with the preset feature over each dimension is determined for that dimension, the computing resources may be determined based on a fuzzy matching approach. Still in combination with the above example, the value of the dimension after a mask in the preset feature is 10111 01, and the value of the dimension in the running feature is 10111 00111, then the matching degree of the "10111 01" and the "10111 00" can be determined based on the fuzzy matching manner.
It should be noted that, the dimension reduction process is optional, when the computing resource does not reduce the dimension of the preset feature and the operation feature, when the feature is matched, the computing resource determines the matching degree between the operation feature of the a dimension and the preset features of a plurality of a dimensions, and then selects the preset feature with the highest matching degree from the preset features of the a dimensions as the target preset feature. In this case, the computing resource may mask, for each of the a dimensions of the preset feature, a portion of the bits in the dimension by masking, and specifically, see masking, for details, a portion of the bits in each of the B dimensions of the preset feature by masking, which is not described herein.
In addition, if the computing resource determines that the target preset feature matched with the running feature does not exist in the preset features, the computing resource can use the default configuration parameter as the configuration parameter of the computing resource. In one possible implementation, the computing resource further sets a general preset feature, where all features in the general preset feature are "×", that is, all values of each dimension in the general preset feature are "×", and the general preset feature corresponds to a default configuration parameter. For example, the general preset feature and default configuration parameters corresponding to the general preset feature may be stored in the TCAM. When the computing resource determines that the target preset feature matched with the operation feature does not exist in the preset features, the general preset feature can be determined to be matched with the target preset feature, and further default configuration parameters corresponding to the general preset feature are used as configuration parameters of the computing resource.
In step 304, the computing resource is configured using configuration parameters (i.e., target configuration parameters) of the computing resource.
Optionally, the computing resource determines whether to adjust the configuration parameters of the computing resource according to the target configuration parameters and the current configuration parameters of the computing resource. Specifically, if the computing resource determines that the current configuration parameters are consistent with the target configuration parameters, the computing resource does not process the current configuration parameters; and if the computing resource determines that the current configuration parameters are inconsistent with the target configuration parameters, the computing resource adjusts the current configuration parameters of the computing resource to the target configuration parameters.
In one example, the target configuration parameter includes a configuration register value corresponding to the configuration register, and when the computing resource adjusts the current configuration parameter of the computing resource to the target configuration parameter, the computing resource may specifically write the configuration register value into the configuration register. Optionally, the target configuration parameter includes configuration register values corresponding to the plurality of configuration registers respectively. For example, the target configuration parameters include configuration register values (denoted as values a to c) corresponding to the configuration registers a to c, respectively, the computing resource writes the value a into the configuration register a, writes the value b into the configuration register b, and writes the value c into the configuration register c.
In still another example, the target configuration parameter includes a configuration register address and a configuration register value corresponding to the configuration register address, and when the computing resource adjusts the current configuration parameter of the computing resource to the target configuration parameter, the computing resource may specifically write the configuration register value into the configuration register corresponding to the configuration register address. Optionally, the target configuration parameter includes configuration register values corresponding to the plurality of configuration register addresses respectively. For example, the target configuration parameter includes an address a and a value a, an address b and a value b, an address c and a value c, the computing resource writes the value a into a configuration register corresponding to the address a, writes the value b into a configuration register corresponding to the address b, and writes the value c into a configuration register corresponding to the address c. This approach helps to increase flexibility in parameter configuration compared to the previous example.
In the technical scheme, in the process of operating the program, the computing resource determines the operating characteristics of the operated or running subprogram in the plurality of similar subprograms included in the program, further determines the target configuration parameters of the computing resource according to the operating characteristics, and configures the target configuration parameters in the computing resource, so that the program is operated through the computing resource, thereby being beneficial to realizing the high efficiency of program operation. For example, a computing resource can optimize program performance by controlling the aggressiveness of a prefetch algorithm when running a program through the computing resource. Specifically, the computing resource obtains the running characteristics (such as the memory bandwidth occupancy rate and the prefetch hit rate) in the running or running sub-program, and when the memory bandwidth occupancy rate of the sub-program is identified to be high and the prefetch hit rate is low, the computing resource can be used for improving the program performance by adjusting the configuration parameters of the computing resource (such as reducing the activation degree of the prefetch algorithm and reducing the original progressive strategy to the passive strategy). Otherwise, the computing resource may be used to improve the program performance by adjusting the configuration parameters of the computing resource (e.g., increasing the aggressive degree of the prefetch algorithm, and increasing the original passive policy to the active policy).
A schematic flow chart of a computing resource determination loop procedure is provided in fig. 4.
In step 401, the computing resource executes a conditional jump instruction after executing the first subroutine.
Wherein, conditional jump instructions are understood as: an instruction that is interpreted by a conditional statement in a programming language, such as if, while, for, etc., and that is understandable to a computing resource.
The jump is specifically that the computing resource executes the second instruction after executing the first instruction. The first instruction and the second instruction are two discrete instructions in a piece of program code, the first instruction precedes the second instruction or the first instruction follows the second instruction, e.g., in a piece of program code, the first instruction is located in line 1 and the second instruction is located in line 10.
The computing resource may execute a conditional jump instruction, which may specifically be executed during the execution of the program by the computing resource reading the conditional jump instruction from the memory. The memory is, for example, memory, high bandwidth storage (high bandwidth memory, HBM) and non-volatile memory. Optionally, the memory is comprised in the processing device.
In addition, after executing the first subroutine, if the conditional jump instruction is not executed, the computing resource determines that the first subroutine is not a loop program.
It should be added that the computing resource executes an instruction (such as a conditional jump instruction) during the process of running the program, which corresponds to the computing resource executing instruction, or the program executing instruction.
In step 402, the computing resource determines identification information for a conditional jump instruction.
Illustratively, the identification information of the conditional jump instruction is address information of the conditional jump instruction, or a hash of the address information of the conditional jump instruction, or a map of the address information of the conditional jump instruction, or the like.
Wherein the address information of the conditional jump instruction includes a start position and/or a target position of the conditional jump instruction, and in combination with the example in step 401, the start position of the conditional jump instruction is a first instruction and the target position is a second instruction.
Optionally, the computing resource obtains the identification information of the conditional jump instruction from the branch recording module.
Optionally, the computing resource further determines a first instruction number. Wherein the first instruction number is the total number of instructions (i.e., instruction number) that the program accumulates when the computing resource executes the conditional jump instruction.
Further, the performance monitoring module includes an instruction counter, and the computing resource specifically reads the first instruction number from the instruction counter of the performance monitoring module. Based on whether an over-reset of the instruction counter occurs, two examples are as follows: example a, after the program begins running, if the instruction counter has not been reset, the first instruction number is the total number of program accumulated execution instructions. Example b, if the instruction counter is over-reset after the program starts running, the first instruction number is the total number of instructions the program cumulatively executes after the last instruction counter reset.
Optionally, the computing resource further determines a first number of executions. Wherein the first execution number is a total number of times the computing resource executes the conditional jump instruction. Similarly, the performance monitoring module further includes an execution count counter, and the computing resource is further capable of reading the first execution count from the execution count counter of the performance monitoring module, and specifically, the computing resource can refer to an implementation manner of reading the first instruction count from the instruction counter of the performance monitoring module.
Illustratively, the branch recording module and the performance monitoring module are both located within the computing resource. Still further exemplary, the branch logging module is located within the computing resource, some of the counters (e.g., instruction counter, execution count counter, etc.) in the performance monitoring module are located within the computing resource, and other counters (e.g., counters for monitoring bus communications count) in the performance monitoring module are located outside of the computing resource.
In step 403, the computing resource determines, according to the identification information of the conditional jump instruction, that the number of times the first subroutine is repeatedly executed is greater than a third preset value, and further determines that the first subroutine is a loop program.
In this application, a subroutine that a computing resource determines to be executing is a loop program, and it may be considered that a subroutine that a computing resource determines to be executing is in a loop state. The computing resource determines that its executing loop program is about to exit/exit the loop, and may also be considered as the computing resource determining that its executing program is about to exit/exit the loop state.
The computing resource includes a buffer (buffer) that stores record information including a plurality of conditional jump instructions.
The record information of the conditional jump instruction comprises a jump identifier, and optionally, the record information also comprises one or more of the instruction number, the execution length and the execution times.
In the following, description will be given of each field in the record information and the update method of the record information, taking the record information of any conditional jump instruction in the cache as an example.
(1) And (3) jump identification: based on identification information of the conditional jump instruction. Illustratively, the jump identifier is identification information of the conditional jump instruction, or the jump identifier is a hash of identification information of the conditional jump instruction. Wherein, the identification information of the conditional jump instruction can be referred to the description in step 402.
(2) Instruction number: the program accumulates the total number of execution instructions as the computing resource executes the conditional jump instruction. For example, after the program starts running, if the instruction counter is over-reset, the total number of program accumulated execution instructions is the total number of program accumulated execution instructions after the last instruction counter reset.
(3) Execution length: the computing resource is determined according to the total number of program accumulation execution instructions between executing the same conditional jump instruction twice, and the execution length is the length of the loop program.
The execution length in the record information may be specifically the total number of program accumulated execution instructions between the condition jump instruction executed by the computing resource according to the kth time and the condition jump instruction executed by the computing resource according to the kth time, where K and K are positive integers. In particular, when K is equal to 1, the execution length value in the record information may be 0.
In the case of k=1:
the execution length in the record information is specifically determined by the computing resource according to the total number of program accumulated execution instructions between the K-th execution conditional jump instruction and the K-1 th execution conditional jump instruction. For example, the execution length is the total number of program accumulated execution instructions between the two adjacent execution of the same conditional jump instruction.
In the case of k > 1:
in one example, the execution length in the record information is specifically the total number of program accumulated execution instructions between the K-th execution of conditional jump instructions and the K-th execution of conditional jump instructions by the computing resource. For example, the total number of program accumulated execution instructions between the 10 th execution conditional jump instruction and the 5 th execution conditional jump instruction is 500, and then the execution length in the record information is specifically 500.
In yet another example, the execution length in the record information is specifically the average of all two adjacent conditional jump instructions executed by the computing resource between the kth executed conditional jump instruction and the kth-K executed conditional jump instruction. For example, the total number of program accumulated execution instructions between the 10 th execution conditional jump instruction and the 5 th execution conditional jump instruction is 500, and then the execution length in the record information is specifically 100.
(4) The number of executions: the number of times the computing resource executes the conditional jump instruction.
As follows, taking the case that the record information of the conditional jump instruction includes a jump identifier, an instruction number and an execution length, the method is explained in such a way that the record information of the conditional jump instruction is written into the cache by the computing resource:
when the computing resource prepares to write the record information of the conditional jump instruction into the cache, determining whether the record information of the conditional jump instruction exists in the current cache in advance, if so, updating the record information to be written into the original record information; otherwise, the record information to be written (i.e., a new piece of record information) is written in the buffer.
For example, the conditional jump instruction instructs the program to jump from instruction 10 to instruction 7, and accordingly, the identification information of the conditional jump instruction includes a start position "instruction 10" and a target position "instruction 7", and the computing resource determines that the jump identification is hash (10-7) according to the start position "instruction 10" and the target position "instruction 7".
When the computing resource executes the conditional jump instruction for the 1 st time, according to the jump mark 'hash (10-7)', determining that the record information of the conditional jump instruction is not included in the cache, and the total number of program accumulated execution instructions is 400. Further, the computing resource writes "hash (10-7), 400, and 0" as the record information of the conditional jump instruction into the cache.
The computing resource determines that the cache includes record information of the conditional jump instruction according to the jump flag "hash (10-7)" when the conditional jump instruction is executed 2 nd time, and determines that the total number of program accumulated execution instructions is 500 when the conditional jump instruction is executed 2 nd time, and that the total number of program accumulated execution instructions is 100 (i.e., 500-400) between the execution of the conditional jump instruction 1 st time and the execution of the conditional jump instruction 2 nd time. It will be appreciated herein that in executing "instruction 7" through "instruction 10", there may be function calls, etc., so the total number of program cumulative execution instructions is 100 (i.e., greater than 4). Further, the computing resource updates the "hash (10-7), 500, and 100" to the record information "hash (10-7), 400, and 0" existing in the cache.
Similarly, when the computing resource executes the conditional jump instruction for the nth time (where n is greater than 2), the computing resource determines record information including the conditional jump instruction in the cache according to the jump identifier "hash (10-7)". Further, the computing resource determines that the total number of program accumulated execution instructions is X when the conditional jump instruction is executed n times, and that the total number of program accumulated execution instructions is Y between the (n-1) th execution of the conditional jump instruction and the n-th execution of the conditional jump instruction. The computing resource updates the hash (10-7), X and Y to the record information corresponding to the existing conditional jump instruction in the cache.
It should be noted that, when the record information includes the jump identifier, or the record information includes the jump identifier and the instruction number, or the record information includes the jump identifier and the execution number, the manner of writing the record information into the cache by the computing resource is similar to the above.
Furthermore, when the computing resource writes the record information of the conditional jump instruction into the cache, the record information of the acyclic jump instruction (such as the conditional jump instruction corresponding to the if else) and the record information of other conditional jump instructions between two identical conditional jump instructions can be filtered out.
In one embodiment, the computing resource sequentially writes the record information of the conditional jump instruction into the cache, and when writing the record information of a certain jump instruction, if it is determined that the record information of the jump instruction already exists in the cache and the record information of other jump instructions is recorded after the record information, the record information of the other jump instructions recorded after the record information can be emptied, and the record information of the jump instruction in the cache is updated according to one or more of a new instruction number, an execution length and an execution number of the conditional jump instruction.
For example, the program code For the For loop is as follows:
accordingly, the computing resource, when executing the program code For the For loop, specifically executes the instruction flow as in table 1:
TABLE 1
The corresponding jump sequence of table 1 is shown in table 2:
TABLE 2
Initial position Target position Description of the invention
2 11 Unconditional jumps, filtered out
12 3 ——
6 9 Unconditional jumps, filtered out
10 7 Repeated 10 times
12 3 ——
10 7 Repeated 10 times
12 3 ——
Then, when the computing resource writes the record information of the conditional jump instruction in the cache, the following actions are specifically executed:
step 1, inserting a hash (12-3), X1 and Y1;
step 2, inserting a hash (10-7), X2 and Y2;
step 3, updating the hash (10-7), X3 and Y3;
step 4, updating the hash (10-7), X4 and Y4;
step 11, updating the hash (10-7), X11 and Y11;
step 12, updating the hash (12-3), X12 and Y12, and deleting the hash (10-7), X11 and Y11;
step 13, inserting a hash (10-7), X13 and Y13;
step 14, updating the hash (10-7), X14 and Y14;
step 15, updating the hash (10-7), X15 and Y15;
……
it should be noted that, for ease of understanding, the number of instructions and execution length described above are each indicated by X, Y. Further, in step 12, when the computing resource writes "hash (12-3), X12, Y12", it is determined that "hash (12-3), X1, Y1" already exists in the cache, and "hash (12-3), X1, Y1" further includes other record information, namely "hash (10-7), X11, Y11", then the computing resource updates "hash (12-3), X12, Y12", and at the same time deletes "hash (10-7), X11, Y11". Thus, the method helps to reduce the data volume of the recorded information of the conditional jump instruction in the cache and avoids more frequently configuring parameters in the computing resource.
Similarly, when the record information includes the execution times, the computing resource may also record the execution times of the conditional jump instruction currently executed in each record information. For example, in the step 3, the execution number of the resource record condition jump instruction is equal to 2; for another example, in the step 12, the execution count of the resource record condition jump instruction is equal to 2.
In connection with the above description of the cache, it is explained how the computing resource determines, after executing the first subroutine and executing the conditional jump instruction, that the first subroutine is an implementation of the loop program based on the pieces of record information in the cache.
After executing the conditional jump instruction, the computing resource determines whether record information of the conditional jump instruction exists in the cache according to the identification information of the conditional jump instruction. Specifically, the computing resource determines a jump identifier according to the identification information of the conditional jump instruction, where the jump identifier is, for example, the identification information of the conditional jump instruction or a hash of the identification information of the conditional jump instruction. Calculating a plurality of pieces of record information in the resource traversal cache, and if a certain piece of record information comprises the jump identifier, determining the record information comprising the conditional jump instruction in the cache; if the jump identifier is not included in the pieces of record information in the cache, the record information of the conditional jump instruction is not included in the cache.
Further, if the computing resource determines that the record information of the conditional jump instruction exists in the cache, determining whether the first subroutine is a loop program according to the record information of the conditional jump instruction in the cache, and updating the record information of the conditional jump instruction in the cache. If the computing resource determines that the record information of the conditional jump instruction does not exist in the cache, the record information of the conditional jump instruction is added in the cache. The following is explained in terms of:
in case 1, the computing resource determines that the record information of the conditional jump instruction is included in the cache.
The computing resource determines that the number of times the first subprogram is repeatedly executed by the computing resource is greater than a third preset value according to the recorded information of the conditional jump instruction existing in the cache, and further determines that the first program is a loop program.
In one possible manner, the record information of the conditional jump instruction already exists in the cache, which indicates that the first subprogram is repeatedly executed by the computing resource, and the computing resource determines that the number of times that the first subprogram is repeatedly executed by the computing resource is greater than a third preset value, so as to determine that the first program is a loop program, wherein the value of the third preset value is 1.
In another possible manner, the record information of the conditional jump instruction existing in the cache includes a jump identifier and the execution times of the conditional jump instruction, and if the calculation resource determines that the execution times of the conditional jump instruction are greater than a third preset value, the calculation resource determines that the first program is a loop program, where the value of the third preset value is, for example, 2.
In yet another possible manner, the record information of the conditional jump instruction existing in the cache includes a jump identifier, the number of instructions (denoted as the second instruction number), and the execution length (denoted as the second execution length).
Wherein the second instruction number is the total number of program accumulated execution instructions when the computing resource executes the same conditional jump instruction once before executing the conditional jump instruction (i.e., step 401). It is explained that the computing resource executes the conditional jump instruction a plurality of times, e.g. the current execution (i.e. step 401) is the nth execution, and the second instruction number may be the total number of program accumulated execution instructions by the computing resource when the conditional jump instruction is executed the nth-1 execution.
The second execution length is the total number of program accumulated execution instructions between two execution of the same conditional jump instruction by the computing resource before executing the conditional jump instruction (i.e., step 401). It is explained that the computing resource executes the conditional jump instruction multiple times, for example, the current execution time (i.e., step 401) is the nth time, and then the second execution length is the total number of execution instructions that the computing resource has accumulated between the nth-1 execution conditional jump instruction and the nth-2 execution conditional jump instruction.
The computing resource determines that the first program is a loop program based on the second instruction number and the second execution length. In particular, see the following possible mode 1 or possible mode 2.
Possible mode 1:
if the computing resource determines that the second execution length is 0 (i.e., the record information was recorded by the computing resource when the conditional jump instruction was first executed), then the computing resource determines that the first subroutine is a loop program.
If the computing resource determines that the second execution length is not 0 (i.e. the record information is recorded by the computing resource when the mth execution of the conditional jump instruction is performed, m is greater than 1), the computing resource takes the difference between the second instruction number and the first instruction number as the first execution length. The computing resource determines that the first subroutine is a loop program in the event that it is determined that the difference between the first execution length and the second execution length is less than a difference threshold. Conversely, the computing resource determines that the first subroutine is not a loop program in the event that it is determined that the difference between the first execution length and the second execution length is greater than or equal to a difference threshold.
Possible mode 2:
if the computing resource determines that the second execution length is 0 (i.e., the record information is recorded by the computing resource when executing the conditional jump instruction for the first time), it is determined that the first subroutine is not a loop program.
If the computing resource determines that the second execution length is not 0 (i.e., the record information is recorded by the computing resource when the mth execution of the conditional jump instruction is performed, where m is greater than 1), the computing resource determines the first execution length according to the second instruction number and the first instruction number. In the case where it is determined that the difference between the first execution length and the second execution length is smaller than the difference threshold, it is determined that the first subroutine is a loop program. Conversely, the computing resource determines that the first subroutine is not a loop program in the event that it is determined that the difference between the first execution length and the second execution length is greater than or equal to a difference threshold.
It will be appreciated that in possible mode 1, when the computing resource executes the first subroutine twice in succession, the computing resource considers the first subroutine to be a loop program; in possible mode 2, when the computing resource executes the first subroutine three times in succession, the computing resource considers the first subroutine to be a loop program. In this way, possible mode 2 has higher accuracy than possible mode 1.
In addition, the computing resource updates the record information of the conditional jump instruction in the cache according to one or more of the jump identifier, the first instruction number, the first execution length and the first execution times of the conditional jump instruction. The computing resource updates the first instruction number and the first execution length to the record information corresponding to the conditional jump instruction in the cache according to the jump identifier of the conditional jump instruction, that is, updates the second instruction number in the record information corresponding to the conditional jump instruction to the first instruction number, and updates the second execution length in the record information corresponding to the conditional jump instruction to the first execution length.
In one possible manner, if the record information of the conditional jump instruction includes the execution times, the computing resource may further update the execution times in the record information first, and when the execution times after updating is greater than a third preset value, it is determined that the number of times the first sub-program is repeatedly executed by the computing resource is greater than the third preset value, and then it is determined that the first program is a loop program.
In case 2, the computing resource determines that the record information of the conditional jump instruction is not included in the cache.
The computing resource determines that the first sub-program is not a loop program. Furthermore, the computing resource further adds the record information of the conditional jump instruction in the cache according to one or more of the jump identifier, the first instruction number and the first execution length of the conditional jump instruction. Illustratively, the computing resource adds one or more of the jump identification, the first instruction number, the first execution length, and the first execution number of the conditional jump instruction as the record information of the conditional jump instruction to the cache. Illustratively, the first execution length is 0 and the first execution number is 1.
It should be noted that, the computing resource determines whether the cache includes the record information of the conditional jump instruction, which may be understood that the computing resource attempts to acquire the record information of the conditional jump instruction from the cache, and further determines whether the first subroutine is a loop program based on whether the record information of the conditional jump instruction is acquired. Specifically, if the computing resource can obtain the record information of the conditional jump instruction from the cache, according to the obtained record information of the conditional jump instruction, it is determined that the number of times that the first subroutine is repeatedly executed by the computing resource is greater than a third preset value, and then it is determined that the first program is a loop program. If the computing resource fails to acquire the record information of the conditional jump instruction from the cache, the computing resource determines that the first subprogram is not a loop program.
Based on the foregoing possible manner 1 and case 2 in case 1, as shown in fig. 5, a flowchart of a specific implementation of the computing resource determination loop procedure is provided for the exemplary purposes of this application, it will be understood that fig. 5 is a specific implementation of fig. 4. Wherein related terms in fig. 5 can be described with reference to the related embodiment of fig. 4.
In step 501, the computing resource executes a conditional jump instruction after executing the first subroutine.
In step 502, the computing resource determines, according to the jump identifier of the conditional jump instruction, whether the cache includes the record information of the conditional jump instruction. Specifically, if the computing resource determines that the cache does not include the record information of the conditional jump instruction, step 503 is executed, and if the computing resource determines that the cache does include the record information of the conditional jump instruction, step 504 is executed.
In step 503, the computing resource adds the record information of the conditional jump instruction to the cache.
At step 504, the computing resource determines whether the second execution length is 0. Specifically, if the computing resource determines that the second execution length is 0, step 505 is executed, and if the computing resource determines that the second execution length is not 0, step 506 is executed.
In step 505, the computing resource updates the record information of the conditional jump instruction in the cache, and determines that the first subroutine is a loop program.
In step 506, the computing resource determines whether the difference between the first execution length and the second execution length is less than a difference threshold. Specifically, if the computing resource determines that the difference between the first execution length and the second execution length is less than the difference threshold, then step 505 is executed; if the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, then step 507 is performed.
In step 507, the computing resource updates the record information of the conditional jump instruction in the cache, and determines that the first subroutine is not a loop program.
Based on the foregoing possible manner 2 in case 1 and case 2, as shown in fig. 6, a flowchart of a specific implementation manner of the computing resource determination loop program is provided for the exemplary purposes of this application, it will be understood that fig. 6 is yet another specific implementation manner of fig. 4. Wherein related terms in fig. 6 can be described with reference to the related embodiment of fig. 4.
In step 601, the computing resource executes a conditional jump instruction after the first subroutine is executed.
In step 602, the computing resource determines, according to the jump identifier, whether the cache includes record information of the conditional jump instruction.
Specifically, if the computing resource determines that the cache does not include the record information of the conditional jump instruction, step 603 is executed, and if the computing resource determines that the cache does include the record information of the conditional jump instruction, step 604 is executed.
In step 603, the computing resource adds the record information of the conditional jump instruction to the cache.
In step 604, the computing resource determines whether the second execution length is 0.
Specifically, if the computing resource determines that the second execution length is 0, step 605 is executed, and if the computing resource determines that the second execution length is not 0, step 606 is executed.
In step 605, the computing resource updates the record information of the conditional jump instruction in the cache.
In step 606, the computing resource determines whether the difference between the first execution length and the second execution length is less than a difference threshold. Specifically, if the computing resource determines that the difference between the first execution length and the second execution length is less than the difference threshold, then step 608 is performed; if the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, then step 607 is performed.
In step 607, the computing resource updates the record information of the conditional jump instruction in the cache, and determines that the first subroutine is not a loop program.
In step 608, the computing resource updates the record information of the conditional jump instruction in the cache, and determines that the first subroutine is a loop program.
Optionally, after executing (or acquiring, or receiving) the conditional jump instruction, the computing resource may also determine whether the conditional jump instruction points to a small loop, where a small loop refers to a loop in which the number of instructions executed in a single loop is less than a second preset value. Alternatively, the computing resource may determine whether the program is in a loop state based on indications in a microinstruction module (e.g., LSD), and thus whether the conditional jump instruction is directed to a loop. If the computing resource determines that the conditional jump instruction points to a loop, the conditional jump instruction is filtered out without performing steps 402 and 403. It is also understood that the program is in a loop state, and the number of instructions in a single loop corresponding to the loop state is greater than the second preset value. Thus, the method is beneficial to reducing the data volume of the recorded information of the conditional jump instruction in the cache, thereby saving the cache space of the computing resource. In addition, the configuration parameters of the computing resources are prevented from being frequently adjusted, and the computing power consumption of the computing resources is reduced.
It should be added that, in the case that the multiple similar subroutines are loop programs that are executed multiple times in the program, due to the limited number of loops, the computing resource needs to recognize that the program has/will exit the loop (e.g., the subroutine that ends the loop execution) as soon as possible, so as to avoid that, in the case that the program has exited the loop, the computing resource still uses the configuration parameters applicable to the loop to run the program. The first subroutine in the procedure is still described as an example below.
In a first possible manner, after the computing resource determines that the first subroutine is a loop program, the instruction number threshold may also be determined according to the first instruction number and the preset execution length. Illustratively, the instruction number threshold is equal to a sum of the first instruction number and a preset execution length. Optionally, the preset execution length is preset, and the preset execution length is greater than the first execution length and also greater than the second execution length. Optionally, the preset execution length is the first execution length or the second execution length.
Optionally, the computing resource writes the jump identifier of the conditional jump instruction and the instruction number threshold together into the cache, so that the computing resource can determine whether the program exits the loop, i.e. does not execute the first subroutine again, according to the jump identifier and the instruction number threshold in the cache. In a possible manner, the computing resource writes the instruction number threshold into the record information corresponding to the jump identifier (i.e. the record information of step 403), where it can be understood that the record information corresponding to the jump identifier includes not only one or more of the instruction number, the execution length, and the execution number corresponding to the jump identifier, but also the instruction number threshold corresponding to the jump identifier. In yet another possible approach, the computing resource writes the jump identification and instruction number threshold of the conditional jump instruction separately as a piece of record information into the cache.
If the computing resource determines that the new conditional jump instruction is not executed again when the total number of program accumulated execution instructions exceeds the instruction number threshold, the computing resource determines that the program exits the loop, i.e., does not execute the first subroutine again. If the computing resource determines that the new conditional jump instruction is executed again before the total number of program accumulated execution instructions exceeds the instruction number threshold, then it determines that the program is still in the loop, i.e., the first subroutine is still executing in the loop.
In a second possible manner, after the computing resource determines that the first subroutine is a loop program, the computing resource may further write a predetermined execution length (or referred to as a predetermined instruction number) into a predetermined register, and for each instruction executed by the computing resource, the computing resource indicates that the instruction number in the predetermined register is reduced by 1. Thus, when the instruction number in the preset register is 0, if the computing resource still does not execute the new conditional jump instruction again, the computing resource determines that the program exits the loop, i.e. the first subroutine is not executed again. If the computing resource executes the new conditional jump instruction again before the instruction number in the preset register is 0, it is determined that the program is still in the loop, that is, the first subroutine is still executing in the loop. Furthermore, each time the computing resource executes the conditional jump instruction, the predetermined instruction number may be flushed to the predetermined register.
Optionally, after determining that the program exits the loop, the computing resource further determines the configuration parameters of the computing resource as default configuration parameters. In one possible implementation, the computing resource does not adjust the current configuration parameters when determining that the current configuration parameters are default configuration parameters; when the computing resource determines that the current configuration parameter is not the default configuration parameter, the computing resource adjusts the current configuration parameter to the default configuration parameter.
Optionally, after determining that the program is still executing the first subroutine in a loop, in one example, the computing resource further obtains an operation feature of the first subroutine, and determines a target configuration parameter of the computing resource according to the operation feature; in another example, the computing resource may no longer detect the operating characteristics of the first subroutine in each loop during the loop's execution of the first subroutine, but instead adjust the configuration parameters of the computing resource to default configuration parameters after determining that the first subroutine is not to be executed in a loop, which latter example may help reduce power consumption or complexity during the operation of the program by the computing resource.
It is to be appreciated that where the computing resource is a processor, the processor may include a plurality of processor cores therein.
In one possible manner, each processor core, during the running of the program, performs the method of the above-described method embodiment to determine the target configuration parameters of the present processor core, i.e., the target configuration parameters are processor core granularity. The processor core then configures the target configuration parameters in the processor core.
For example, processor core 1 through processor core 5 are included in the computing resources. Wherein, during the process of executing the program, the processor core 1 further executes the method in the above method embodiment to determine the target configuration parameter 1 of the processor core 1. The processor core 1 then configures the target configuration parameter 1 in the processor core 1. Similarly, other processor cores may determine respective target configuration parameters according to respective running programs.
In another possible manner, a plurality of processor cores run the same program, and one processor core is configured to perform the method in the above embodiment of the method to determine the target configuration parameter corresponding to the same program, that is, the target configuration parameter is of the program granularity. Further, the processor core for determining the target configuration parameter may be one of a plurality of processor cores running the program, or may be another processor core independent of the plurality of processor cores in the computing resource. The processor core can further allocate the target configuration parameters to the plurality of processor cores, respectively.
For example, processor core 1 through processor core 5 are included in the computing resources. In one example, the processor cores 1 to 5 run the same program, and the processor core 1 further executes the method in the above embodiment of the method to determine the target configuration parameters corresponding to the same program, and configures the target configuration parameters corresponding to the same program in the processor cores 1 to 5. In yet another example, the processor cores 2 to 5 run the same program, and the processor core 1 executes the method in the above embodiment to determine the target configuration parameters corresponding to the same program, and configures the target configuration parameters corresponding to the same program into the processor cores 2 to 5.
Based on the above and the same concepts, fig. 8 is a schematic structural view of one possible processing apparatus provided in the present application. The processing device may be used to implement the functions of the method embodiments described above, and thus has the beneficial effects of the method embodiments described above.
As shown in fig. 8, the processing device 800 includes a parameter determination module 801 and a configuration module 802.
The device for running the program includes a parameter determining module 801 and a configuration module 802.
The parameter determining module 801 is configured to, when it is determined that a program running on a computing resource in a processing device includes multiple similar sub-programs, obtain running features of a sub-program that has been run or is currently run in the multiple similar sub-programs, where the multiple similar sub-programs are multiple sub-programs with a similarity of the running features being greater than or equal to a first preset value. And determining configuration parameters of the computing resources according to the operation characteristics.
A configuration module 802 for configuring the computing resource using the determined configuration parameters.
In one possible implementation, the multiple similar sub-programs are loop programs that are executed multiple times in a program, each of the multiple sub-programs corresponding to one or more loops of the loop program.
In one possible implementation, the number of instructions of the loop program is greater than a second preset value.
In a possible implementation manner, the apparatus further includes a detection module 803, where the detection module 803 is configured to determine that the running program in the computing resource includes multiple similar sub-programs. Specifically, the detection module 803 determines that the first subroutine is a loop program when it is determined that the number of times the first subroutine is repeatedly executed in the program is greater than a third preset value.
In one possible implementation manner, the detection module 803 is specifically configured to, when determining whether the number of times the first subroutine in the program is repeatedly executed is greater than a third preset value: after the execution of the first subroutine and the execution of the conditional jump instruction, determining whether there is record information of the conditional jump instruction; if the record information of the conditional jump instruction exists, determining whether the number of times the first subprogram is repeatedly executed is larger than a third preset value according to the record information of the conditional jump instruction; if it is determined that the record information of the conditional jump instruction does not exist, the record information of the conditional jump instruction is added.
In one possible implementation, after the detection module 803 executes the conditional jump instruction, the method is further configured to: it is determined whether the conditional jump instruction points to a small loop, where a small loop refers to a loop in which the number of instructions executed in a single loop is less than a second preset value. If the conditional jump instruction points to a small loop, filtering the conditional jump instruction; if the conditional jump instruction does not point to a loop, it is further determined whether there is record information for the conditional jump instruction.
In one possible implementation, the record information is recorded in a predetermined section of a cache of the computing resource.
In one possible implementation, the detection module 803 is specifically configured to, when determining whether there is record information of the conditional jump instruction: and determining a jump identifier according to the identification information of the conditional jump instruction, wherein the jump identifier is the identification information of the conditional jump instruction or the hash of the identification information of the conditional jump instruction. Traversing a plurality of pieces of record information in the cache, and if a certain piece of record information comprises the jump identifier, determining the record information comprising the conditional jump instruction in the cache; if the jump identifier is not included in the pieces of record information in the cache, the record information of the conditional jump instruction is not included in the cache. Optionally, the identification information of the conditional jump instruction is obtained from a branch recording module. The identification information includes a start position and/or a target position, or includes a hash of the start position and/or the target position.
In one possible implementation manner, the detection module 803 is specifically configured to, when it is determined, according to the record information, that the number of times the first subroutine is repeatedly executed is greater than a third preset value: determining that the number of times the first subprogram is repeatedly executed is greater than a third preset value according to the instruction number and the execution length in the recorded information, wherein the instruction number in the recorded information is used for indicating the instruction number of accumulated execution of the program when the conditional jump instruction is executed last time; the execution length in the record information is the difference of the numbers of instructions that the program cumulatively executes when executing the conditional jump instruction in the previous two times, respectively.
In one possible implementation manner, the third preset value is equal to 2, and the detection module 803 is specifically configured to, when determining, according to the number of instructions and the execution length in the record information, that the number of times the first subroutine is repeatedly executed is greater than the third preset value: the instruction number of program execution when executing the conditional jump instruction is taken as a first instruction number, and the difference between the first instruction number and the instruction number in the record information is taken as a first execution length. If it is determined that the execution length in the recorded information is not 0 and the difference between the first execution length and the execution length in the recorded information is smaller than the difference threshold, it is determined that the number of times the first subroutine is repeatedly executed is greater than 2. In one possible implementation, the detection module 803 is further configured to, after determining that the number of times the first subroutine is repeatedly executed is greater than the third preset value: the record information is updated according to the first instruction number and the first execution length.
In one possible implementation, the detection module 803, after determining that the first subroutine is a loop program, is further configured to: the instruction number in the record information is updated according to the instruction number (i.e., the first instruction number) that the program cumulatively executes when the conditional jump instruction is executed. And determining an instruction number threshold according to the first instruction number and the preset execution length in the updated record information. When the instruction number of the program execution instruction reaches the instruction number threshold, if the conditional jump instruction is not executed again, determining that the loop program has been exited; the parameter determination module 801 is further configured to: the configuration parameters of the computing resources are determined as default configuration parameters.
In one possible implementation manner, the detection module 803 is specifically configured to, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value: and determining whether the execution times of the conditional jump instruction are larger than a third preset value according to the execution times of the conditional jump instruction in the record information. And when the execution times of the conditional jump instruction are greater than a third preset value, determining that the repeated execution times of the first subprogram are greater than the third preset value.
In one possible implementation manner, the detection module 803 is specifically configured to, when determining, according to the record information, whether the number of times the first subroutine is repeatedly executed is greater than a third preset value: updating the execution times of the conditional jump instruction in the record information; and determining whether the execution times of the conditional jump instruction are larger than a third preset value according to the execution times of the conditional jump instruction after updating.
In one possible implementation, the parameter determining module 801 is specifically configured to, when determining the operation feature in one cycle of the first subroutine being operated by the computing resource, obtain the operation feature of the computing resource operation program when executing the conditional jump instruction; and acquiring the running characteristics of the program when the conditional jump instruction is executed once before the conditional jump instruction is executed; and determining the operation characteristics in one cycle of the operation of the computing resource on the first subprogram according to the operation characteristics acquired twice.
In one possible implementation manner, the parameter determining module 801 is specifically configured to determine, when determining the configuration parameter of the computing resource according to the operation feature, a target preset feature that matches the operation feature from a plurality of preset features; and determining preset configuration parameters corresponding to the target preset features as configuration parameters of the computing resources.
Illustratively, the configuration parameters of the computing resources include a prefetch policy, including a prefetch policy of a lost cache line, a prefetch policy of integer data access, a aggressiveness of a prefetch algorithm, and the like.
In one possible implementation manner, when determining a target preset feature matched with the running feature from the multiple preset features, the parameter determining module 801 is specifically configured to perform a dimension reduction process on the running feature of the a dimension to obtain a running feature of the B dimension; according to the matching degree between the operation features of the B dimension and the preset features of the B dimensions, selecting the preset feature with the highest matching degree from the preset features of the B dimensions as a target preset feature, wherein A, B is a positive integer, and B is smaller than A.
In one possible implementation manner, when the parameter determining module 801 determines a degree of matching between the running feature of the B dimension and the preset feature of any B dimension, the parameter determining module is specifically configured to determine, for any one dimension of the B dimensions, a degree of matching between the running feature corresponding to the dimension and the preset feature; and determining the matching degree between the operation characteristics of the B dimension and the preset characteristics of the B dimension according to the corresponding matching degree of each dimension in the B dimension. In one possible implementation, in one dimension of the preset features of the B dimension, the dimension includes a plurality of bits, and a value of a bit in the plurality of bits is masked by a mask. When the parameter determining module 801 determines, for the dimension, a degree of matching between the running feature corresponding to the dimension and the preset feature, the determining may specifically be based on a fuzzy matching manner.
In one possible implementation, if the parameter determination module 801 does not determine a target preset feature that matches the running feature from the plurality of preset features, the default configuration parameter is taken as the configuration parameter of the computing resource.
In one possible implementation, the configuration parameters of the computing resource include an address of a configuration register in the computing resource and a configuration register value; the configuration module 802 is specifically configured to write a configuration register value into a configuration register corresponding to an address of the configuration register when configuring a configuration parameter of the computing resource to the computing resource.
In one possible implementation, the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer, the cache miss rate, the prefetch hit rate.
The division of the modules in the embodiments of the present application is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, including several instructions for causing a terminal device (which may be a personal computer, a mobile phone, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The descriptions of the processes corresponding to the drawings have emphasis, and the descriptions of other processes may be referred to for the parts of a certain process that are not described in detail.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, two programs may be launched in a computing resource, one being the program (denoted as the first program) for implementing the methods of the present application and the other being the program (i.e., the program in the previous method embodiments, denoted as the second program) run by the computing resource. The computing resource may alternatively run a first program and a second program, such as running the first program for a first period of time and the second program for a second period of time, where the first period of time and the second period of time do not overlap, it being understood that the computing resource may run one of the programs at a time. Since the first program occupies very small computing resources, i.e. the first period is much shorter than the second period, the computing resources have negligible effect on the running effect of the second program when the first program and the second program are run alternately. Therefore, the first program and the second program can be alternately operated by the computing resource, the computing resource can determine the operating characteristics of the operated or currently operated subprogram in the plurality of sections of similar subprograms included in the second program by operating the first program, and further, the configuration parameters of the computing resource are determined according to the determined operating characteristics, so that the computing resource can operate the second program more efficiently.
Further, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product comprises computer program instructions which, when loaded and executed on a computer, fully or partially produce the processes or functions in the embodiments of the methods associated with figures 3 to 6 according to the embodiments of the present application.
The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) means, the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media.
It will also be appreciated that the detection module 803 is configured to perform the step 301 in fig. 3, and the steps in the embodiments of the methods related to fig. 4 to 6; the parameter determining module 801 is configured to perform steps 302 and 303 in fig. 3; the configuration module 802 is used to perform step 304 in fig. 3. Further, when the parameter determining module 801 and the detecting module 803 are implemented by hardware, that is, when the parameter determining module 801 or the detecting module 803 includes a hardware circuit, the detecting module 803 may send a detection signal to the parameter determining module 801 if it is determined that the program is in a loop after each execution of the conditional jump instruction; if it is determined that the program is not in a loop, a start signal is sent to the parameter determination module 801.
An implementation may be specifically any one of the following examples a to C.
In example a, the detection module 803 sends a one-time start signal and a one-time detection signal to the parameter determination module 801. Accordingly, the parameter determination module 801 resets the feature counter in the performance monitoring module (i.e., the count clears) in response to the start signal; the parameter determination module 801 determines the value of the feature counter in the performance monitoring module in response to the detection signal.
In example B, the detection module 803 sends a one-time start signal and a multiple-time detection signal to the parameter determination module 801. Accordingly, the parameter determination module 801 resets the feature counter in the performance monitoring module (i.e., the count clears) in response to the start signal; the parameter determining module 801 determines the value of the feature counter in the performance monitoring module in response to the first detection signal, and the parameter determining module 801 records the value of the feature counter in the current performance monitoring module in response to each detection signal after the first detection signal, and determines the difference between the value of the feature counter in the current performance monitoring module and the value of the feature counter recorded last time.
In addition, in the above example a and example B, the parameter determining module 801 may further determine the value of the feature counter in the performance monitoring module (i.e., not reset the feature counter in the performance monitoring module) in response to the start signal, and further determine, when the parameter determining module 801 receives the first detection signal, a difference between the value of the feature counter in the performance monitoring module and the value of the feature counter last time in response to the detection signal.
In example C, the detection module 803 sends a multiple detection signal (i.e., does not send a start signal) to the parameter determination module 801. Correspondingly, the parameter determining module 801 determines the value of the feature counter in the performance monitoring module in response to the first detection signal, and the parameter determining module 801 records the value of the feature counter in the current performance monitoring module in response to each detection signal after the first detection signal, and determines the difference between the value of the feature counter in the current performance monitoring module and the value of the feature counter recorded last time.
Two examples of practical applications are provided below in connection with example a: in example 1, a feature counter in the performance monitoring module is used to determine a buffer loss rate, and after the parameter determining module 801 receives the start signal, the feature counter is reset, and after the parameter determining module 801 receives the detection signal, the value of the feature counter is determined, so that the buffer loss rate is determined according to the count of the feature counter between the start signal and the detection signal. Example 2 a feature counter in the performance monitoring module is used to determine the prefetch hit rate, which is reset after the parameter determination module 801 receives the start signal, and the feature counter is determined to take its value after the parameter determination module 801 receives the detection signal, so that the prefetch hit rate is determined from the count of the feature counter between the start signal and the detection signal.
It will be appreciated that the start signal and the detection signal represent only one action triggering mechanism. Illustratively, the start signal is one pulse on the signal line between the detection module 803 and the parameter determination module 801, and the detection signal is another pulse on the signal line between the detection module 803 and the parameter determination module 801. Still further exemplary, the start signal and the detection signal are level changes, for example, the start signal is a level change from 0 to 1 and the detection signal is a level change from 1 to 0.
Further, as exemplified in connection with fig. 5 and example B, the detection module 803 sends a start signal to the parameter determination module 801 when the condition jump instruction is detected 1 st time, and sends a detection signal to the parameter determination module 801 when the condition jump instruction is detected 2 nd time, so that the parameter determination module 801 determines a count value between the condition jump instructions 1 st and 2 nd times, which can be used to indicate the running characteristics of the program in one cycle. It will be further appreciated that when the detection module 803 detects a conditional jump instruction at the mth time (where m is greater than 2), a detection signal is also sent to the parameter determination module 801, and accordingly, the parameter determination module 801 determines the count value between the mth-1 th and mth conditional jump instructions, and thus determines the running characteristics of the program in one cycle between the mth-1 th and mth conditional jump instructions.
Referring to fig. 6 and example B, the detection module 803 sends a start signal to the parameter determination module 801 when the conditional jump instruction is detected 2 nd time, and sends a detection signal to the parameter determination module 801 when the conditional jump instruction is detected 3 rd time, so that the detection module 803 can not only determine the count value between the conditional jump instructions but also improve the accuracy of the loop detection. It will be further understood that in fig. 6 described above, when the detection module 803 detects a conditional jump instruction at the mth time (where m is greater than 3), a detection signal is also sent to the parameter determination module 801, and accordingly, the parameter determination module 801 determines the count value between the mth-1 th and mth conditional jump instructions, and thus determines the running characteristics of the program in one cycle between the mth-1 th and mth conditional jump instructions.
Furthermore, in example B above, there may be one or more of the following: in mode 1, after detecting a plurality of conditional jump instructions, the detection module 803 sends a start signal to the parameter determination module 801; in mode 2, after sending the start signal to the parameter determination module 801, the detection module 803 may send a detection signal to the parameter determination module 801 once after detecting a plurality of conditional jump instructions, where the number of execution times of the conditional jump instructions may be the same or different between sending the detection signal twice. Wherein example a and example C are similar to example B described above.
As an example in connection with fig. 5 and example a described above, the detection module 803 sends a start signal to the parameter determination module 801 when the condition jump instruction is detected 1 st time, and sends a detection signal to the parameter determination module 801 when the condition jump instruction is detected 3 rd time, so that the parameter determination module 801 determines a count value between the condition jump instructions 1 st and 3 rd times, which can be used to indicate the running characteristics of the program in the two loops.
As an example in connection with fig. 6 and example a described above, the detection module 803 sends a start signal to the parameter determination module 801 when the conditional jump instruction is detected 2 nd time, and sends a detection signal to the parameter determination module 801 when the conditional jump instruction is detected 5 th time, so that the parameter determination module 801 determines a count value between the 2 nd and 5 th conditional jump instructions, which can be used to indicate the running characteristics of the program in the two loops.
Based on the above and the same conception, the present application provides a processing device, which includes a computing resource and a memory connected to the computing resource, where the memory is used to store a computer program, and the computing resource is used to execute the computer program stored in the memory, so that the computing resource implements a method in the above method embodiment. Specifically, the processing device includes one or more processors, and the processor includes one or more processor cores, where the processor cores are capable of implementing the method in the above method embodiments when reading a computer program stored in the memory.
Based on the foregoing and the same, the present application provides a computer-readable storage medium having stored therein a computer program or instructions that are executed by a computing resource in a processing device to perform the method of the above-described method embodiments.
Based on the foregoing and the same, the present application provides a processing chip comprising at least one processor core and an interface; an interface for providing program instructions or data to at least one processor core; at least one processor core is configured to execute program line instructions to implement the methods of the method embodiments described above.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application. The sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined according to the function and the internal logic.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (27)

1. A method of operating a program, comprising:
when it is determined that a program running on a computing resource in a processing device comprises multiple similar sub-programs, acquiring running characteristics of the running or currently running sub-programs in the multiple similar sub-programs, wherein the multiple similar sub-programs are multiple sub-programs with running characteristics having similarity greater than or equal to a first preset value;
determining configuration parameters of the computing resources according to the operation characteristics;
the computing resources are configured using configuration parameters of the computing resources.
2. The method of claim 1, wherein the multiple similar sub-programs are loop programs of the program that are executed multiple times by the computing resource, each segment of sub-program corresponding to a loop of the loop program.
3. The method of claim 2, wherein the instruction number of the loop program is greater than a second predetermined value.
4. The method of claim 2, wherein the determining that the computing resource in the processing device is running includes a plurality of similar subroutines in the program, comprising:
and when the number of times that the first subprogram in the program is repeatedly executed by the computing resource is larger than a third preset value, determining that the first subprogram is the loop program.
5. The method of claim 4, wherein prior to determining that the first subroutine is the loop routine, further comprising:
after the first subprogram is executed, if a conditional jump instruction is executed, determining whether recording information of the conditional jump instruction exists;
if the recorded information exists, determining whether the number of times the first subprogram is repeatedly executed is larger than the third preset value according to the recorded information;
and if the recorded information does not exist, adding the recorded information.
6. The method of claim 5, wherein the determining whether the number of times the first subroutine is repeatedly executed is greater than the third preset value based on the recording information comprises:
updating the execution times of the conditional jump instruction in the record information;
and determining whether the number of times the first subprogram is repeatedly executed is larger than the third preset value according to the number of times the conditional jump instruction is executed in the updated record information.
7. The method of claim 5, wherein the determining whether the number of times the first subroutine is repeatedly executed is greater than the third preset value based on the recording information comprises:
Determining a first instruction number, wherein the first instruction number is the instruction number which is accumulatively executed by the program when the conditional jump instruction is executed;
determining whether the number of times the first subprogram is repeatedly executed is greater than the third preset value according to the number of instructions and the execution length in the recorded information and the first instruction number;
the instruction number in the record information is used for indicating the instruction number which is accumulatively executed by the program when the conditional jump instruction is executed last time;
the execution length in the record information is a difference in the number of instructions that the program cumulatively executes when the conditional jump instruction is executed in the previous two times, respectively.
8. The method of claim 7, wherein the third preset value is equal to 2;
the determining, according to the number of instructions and the execution length in the record information and the first number of instructions, whether the number of times the first subroutine is repeatedly executed is greater than the third preset value includes:
taking the difference between the first instruction number and the instruction number in the recorded information as a first execution length;
and when the difference value between the first execution length and the execution length in the recorded information is smaller than a difference value threshold value, determining that the number of times that the first subprogram is repeatedly executed is larger than 2.
9. The method of claim 7, wherein after the determining that the first subroutine is the loop routine, further comprising:
updating the instruction number in the record information according to the first instruction number;
determining an instruction number threshold according to the first instruction number and the preset execution length in the updated record information;
when the instruction number of the program execution instruction reaches the instruction number threshold, if the conditional jump instruction is not executed again, determining that the loop program has been exited;
and determining the configuration parameters of the computing resources as default configuration parameters.
10. The method of claim 5, wherein the record information is recorded in a predetermined segment of a cache of the computing resource.
11. The method of claim 1, wherein the determining configuration parameters of the computing resource based on the operating characteristics comprises:
determining target preset features matched with the operation features from a plurality of preset features according to the operation features;
and determining preset configuration parameters corresponding to the target preset features as the configuration parameters of the computing resources.
12. The method of any one of claims 1-11, wherein the operational characteristics include at least any one or more of: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer, the cache miss rate, the prefetch hit rate.
13. An apparatus for running a program, comprising:
the parameter determining module is used for acquiring the running characteristics of the running or currently running subroutines in the multiple similar subroutines when the running programs in the computing resources of the processing equipment comprise the multiple similar subroutines, wherein the multiple similar subroutines are multiple subroutines with the similarity of the running characteristics being greater than or equal to a first preset value, and determining the configuration parameters of the computing resources according to the running characteristics;
and the configuration module is used for configuring the computing resource by using the configuration parameters of the computing resource.
14. The apparatus of claim 13, wherein the multiple similar sub-programs are loop programs of the program that are executed multiple times by the computing resource, each segment of sub-program corresponding to a loop of the loop program.
15. The apparatus of claim 14, wherein the number of instructions of the loop program is greater than a second predetermined value.
16. The apparatus as recited in claim 14, further comprising: a detection module;
the detection module is used for: and when the number of times that the first subprogram in the program is repeatedly executed by the computing resource is larger than a third preset value, determining that the first subprogram is the loop program.
17. The apparatus of claim 16, wherein the detection module, prior to determining the first subroutine to be the loop routine, is further to:
after the first subprogram is executed, if a conditional jump instruction is executed, determining whether recording information of the conditional jump instruction exists;
if the recorded information exists, determining whether the number of times the first subprogram is repeatedly executed is larger than the third preset value according to the recorded information;
and if the recorded information does not exist, adding the recorded information.
18. The apparatus of claim 17, wherein the detection module is configured to, when determining whether the number of times the first subroutine is repeatedly executed is greater than the third preset value based on the record information:
Updating the execution times of the conditional jump instruction in the record information;
and determining whether the number of times the first subprogram is repeatedly executed is larger than the third preset value according to the number of times the conditional jump instruction is executed in the updated record information.
19. The apparatus of claim 18, wherein the detection module is configured to, when determining whether the number of times the first subroutine is repeatedly executed is greater than the third preset value based on the record information:
determining a first instruction number, wherein the first instruction number is the instruction number which is accumulatively executed by the program when the conditional jump instruction is executed;
determining whether the number of times the first subprogram is repeatedly executed is greater than the third preset value according to the number of instructions and the execution length in the recorded information and the first instruction number;
the instruction number in the record information is used for indicating the instruction number which is accumulatively executed by the program when the conditional jump instruction is executed last time;
the execution length in the record information is a difference in the number of instructions that the program cumulatively executes when the conditional jump instruction is executed in the previous two times, respectively.
20. The apparatus of claim 19, wherein the third preset value is equal to 2;
the detection module is specifically configured to, when determining, according to the number of instructions and the execution length in the record information and the first number of instructions, whether the number of times that the first subroutine is repeatedly executed is greater than the third preset value:
taking the difference between the first instruction number and the instruction number in the recorded information as a first execution length;
and when the difference value between the first execution length and the execution length in the recorded information is smaller than a difference value threshold value, determining that the number of times that the first subprogram is repeatedly executed is larger than 2.
21. The apparatus of claim 19, wherein the device comprises a plurality of sensors,
the detection module is further configured to, after determining that the first subroutine is the loop program:
updating the instruction number in the record information according to the first instruction number;
determining an instruction number threshold according to the first instruction number and the preset execution length in the updated record information;
when the instruction number of the program execution instruction reaches the instruction number threshold, if the conditional jump instruction is not executed again, determining that the loop program has been exited;
The parameter determination module is further configured to:
and determining the configuration parameters of the computing resources as default configuration parameters.
22. The apparatus of claim 17, wherein the record information is recorded in a predetermined segment of a cache of the computing resource.
23. The apparatus of claim 13, wherein the parameter determination module, when determining the configuration parameters of the computing resources based on the operating characteristics, is specifically to:
determining target preset features matched with the operation features from a plurality of preset features according to the operation features;
and determining preset configuration parameters corresponding to the target preset features as the configuration parameters of the computing resources.
24. The apparatus of any one of claims 13-23, wherein the operational characteristics include at least any one or more of: the number of instructions executed by the processor core per clock cycle, the miss rate of the instruction translation look-aside buffer, the cache miss rate, the prefetch hit rate.
25. A processing device comprising a computing resource and a memory coupled to the computing resource, the memory for storing a computer program, the computing resource for executing the computer program stored in the memory, such that the computing resource performs the method of any of claims 1 to 12.
26. A computer readable storage medium having stored therein a computer program or instructions that are run by a computing resource in a processing device to perform the method of any of claims 1 to 12.
27. A processing chip comprising at least one processor core and an interface;
the interface is used for providing program instructions or data for the at least one processor core;
the at least one processor core is configured to execute the program line instructions to implement the method of any one of the computing resources of claims 1 to 12.
CN202211118557.6A 2022-06-25 2022-09-13 Program running method and device Pending CN117331611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/100498 WO2023246625A1 (en) 2022-06-25 2023-06-15 Method and apparatus for running program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022107316654 2022-06-25
CN202210731665 2022-06-25

Publications (1)

Publication Number Publication Date
CN117331611A true CN117331611A (en) 2024-01-02

Family

ID=89292150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211118557.6A Pending CN117331611A (en) 2022-06-25 2022-09-13 Program running method and device

Country Status (2)

Country Link
CN (1) CN117331611A (en)
WO (1) WO2023246625A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2852890A1 (en) * 2012-06-27 2015-04-01 Qatar Foundation An arrangement and method for use in managing resources of a plurality of computing devices
CN105718364B (en) * 2016-01-15 2018-07-17 西安交通大学 Resource capability dynamic assessment method is calculated in a kind of cloud computing platform
CN110858160B (en) * 2018-08-24 2023-04-11 阿里巴巴集团控股有限公司 Resource scheduling method and device, storage medium and processor
CN111625362A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Computing resource scheduling method and device and related components

Also Published As

Publication number Publication date
WO2023246625A1 (en) 2023-12-28

Similar Documents

Publication Publication Date Title
US9262627B2 (en) Methods, devices, and systems for detecting return oriented programming exploits
CN108874692B (en) Spatial memory streaming confidence mechanism
US8782629B2 (en) Associating program execution sequences with performance counter events
US11816015B2 (en) Management of event log information of a memory sub-system
RU2769785C1 (en) Cache-based trace playback breakpoints using tag field reserved bits
US9128749B1 (en) Method and system for lock free statistics collection
US8788887B2 (en) Data processing apparatus, trace unit and diagnostic apparatus
US7971031B2 (en) Data processing system and method
CN109964207B (en) Computer system for time travel debugging and analysis, method implemented at computer system, and hardware storage device
CN106547587B (en) Apparatus and method for generating configuration file of target program
US20190073312A1 (en) Hardware accelerators and access methods thereof
US9792228B2 (en) Enhancing lifetime of non-volatile cache by injecting random replacement policy
CN113448881A (en) Method and apparatus for dynamically enabling and/or disabling prefetchers
US20120124291A1 (en) Secondary Cache Memory With A Counter For Determining Whether to Replace Cached Data
CN101533370B (en) Memory abnormal access positioning method and device
CN115617255A (en) Management method and management device for cache files
US10261905B2 (en) Accessing cache with access delay reduction mechanism
CN117331611A (en) Program running method and device
CN110741345A (en) Branch prediction for fixed direction branch instructions
EP3871093B1 (en) Processor memory reordering hints in a bit-accurate trace
US7428632B2 (en) Branch prediction mechanism using a branch cache memory and an extended pattern cache
US7350025B2 (en) System and method for improved collection of software application profile data for performance optimization
US9767043B2 (en) Enhancing lifetime of non-volatile cache by reducing intra-block write variation
US8943177B1 (en) Modifying a computer program configuration based on variable-bin histograms
CN117688558B (en) Terminal attack lightweight detection method and device based on microstructure abnormal event

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination