WO2023246625A1 - 一种运行程序的方法及装置 - Google Patents

一种运行程序的方法及装置 Download PDF

Info

Publication number
WO2023246625A1
WO2023246625A1 PCT/CN2023/100498 CN2023100498W WO2023246625A1 WO 2023246625 A1 WO2023246625 A1 WO 2023246625A1 CN 2023100498 W CN2023100498 W CN 2023100498W WO 2023246625 A1 WO2023246625 A1 WO 2023246625A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
computing resource
record information
instruction
instructions
Prior art date
Application number
PCT/CN2023/100498
Other languages
English (en)
French (fr)
Inventor
许中虎
王淑倩
徐建荣
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023246625A1 publication Critical patent/WO2023246625A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method and device for running a program.
  • the processing device includes computing resources, which can be used to run many different types of programs.
  • the computing resources are such as a central processing unit (CPU) or a core in a CPU.
  • the processing device may configure the computing resources as default configuration parameters. Taking the computing resource as a CPU as an example, the CPU manufacturer can set the CPU under different configuration parameters during the design and manufacturing stage, and then test the CPU under different configuration parameters to select the optimal configuration parameters as the default for the CPU. Configuration parameters, such as this, ensure that the CPU's default configuration parameters can be used by the CPU to smoothly run various types of programs.
  • the computing resources in the processing device use default configuration parameters to run many different types of programs, which cannot be adapted to the operating characteristics of various types of programs, that is, the computing power of the computing resources cannot be maximized.
  • This application provides a method and device for running a program, which is used to identify the operating characteristics of subprograms in programs running in computing resources, and adjust the configuration parameters of the computing resources based on the identified operating characteristics, which helps to improve the computing resource running program. Execution efficiency and maximum utilization of computing power of computing resources.
  • the program running in the computing resource when it is determined that the program running in the computing resource includes multiple similar subprograms, specifically, it may be that the number of times the first subprogram in the program is repeatedly executed is greater than the third preset value.
  • determine the first subroutine as a loop program In this way, a method for determining that a program includes a loop program is provided, and the accuracy of identifying the loop program is improved.
  • the recording information is recorded in a preset cache (buffer) of the computing resource.
  • the record information can be quickly read in the cache and the speed of identifying loop programs can be improved.
  • the method further includes: updating the record information according to the first instruction number and the first execution length.
  • the The first instruction number and the preset execution length in the recording information are used to determine the instruction number threshold.
  • the sum of the first instruction number and the preset execution length is used as the instruction number threshold, where the preset execution length is set in advance.
  • the execution length is either the first execution length, or the execution length in the record information before updating, etc.
  • the configuration parameters corresponding to the operating characteristics of the subprogram are obtained, thereby obtaining computing resources suitable for the operating characteristics of the subprogram that needs to be run in the future.
  • Configuration parameters help improve the execution efficiency of the program.
  • the specific method when determining the target preset feature that matches the operating feature from multiple preset features, the specific method may be to perform dimensionality reduction processing on the A-dimensional operating feature to obtain the B-dimensional operating feature. ; According to the degree of matching between the operating characteristics of the B dimension and the preset features of multiple B dimensions, the preset feature with the highest matching degree is selected from the preset features of the multiple B dimensions as the target preset feature, where , A and B are both positive integers, and B is smaller than A.
  • the matching degree between the operating characteristics of the B dimension and the preset characteristics of any B dimension when determining the matching degree between the operating characteristics of the B dimension and the preset characteristics of any B dimension, specifically, for any dimension in the B dimension, determine the corresponding The degree of matching between the operating characteristics and the preset characteristics; according to the corresponding matching degree of each dimension in the B dimension, the degree of matching between the operating characteristics of the B dimension and the preset characteristics of the B dimension is determined.
  • the dimension in one dimension of the preset feature of dimension B, the dimension includes multiple bits, and the values of some of the bits in the multiple bits are masked.
  • the determination can be based on fuzzy matching.
  • the default configuration parameter is used as the configuration parameter of the computing resource.
  • the configuration parameters of the computing resource include the address and configuration register value of the configuration register in the computing resource; when configuring the configuration parameters of the computing resource to the computing resource, specifically, writing the configuration register value Enter the configuration register corresponding to the address of the configuration register. This helps improve the flexibility of parameter configuration.
  • the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core in each clock cycle, the instruction conversion lookaside buffer loss rate, cache miss rate, prefetching Hit rate.
  • the present application provides a device for running a program.
  • the device for running a program may be a processing device or a computing resource in the processing device.
  • the computing resource may be a processor or a processing device in the processing device. The core of the processor.
  • the device for running the program includes a parameter determination module and a configuration module.
  • the parameter determination module is used to obtain the running characteristics of the subprograms that have been run or are currently running in the multiple similar subprograms when it is determined that the program being run by the computing resource in the processing device includes multiple similar subprograms, wherein the multiple similar subprograms are
  • the subroutine is a multi-section subroutine whose operating characteristic similarity is greater than or equal to the first preset value. And, determine the configuration parameters of the computing resources based on the operating characteristics.
  • the configuration module is used to configure computing resources using the determined configuration parameters.
  • multiple similar subprograms are loop programs in the program that are executed multiple times, and each subprogram corresponds to one or more cycles of the loop program.
  • the number of instructions of the loop program is greater than the second preset value.
  • the device further includes a detection module, which is used to determine that a program running in the computing resource includes multiple similar subprograms. Specifically, when the detection module determines that the number of times the first subroutine in the program is repeatedly executed is greater than the third preset value, it determines that the first subroutine is a loop program.
  • the detection module determines whether the number of times the first subroutine in the program is repeatedly executed is greater than the third preset value
  • the detection module is specifically used to: complete the execution of the first subroutine and execute a conditional jump. After the instruction, determine whether there is record information of the conditional jump instruction; if it is determined that there is record information of the conditional jump instruction, determine whether the number of times the first subroutine is repeatedly executed is greater than the third predetermined number of times according to the record information of the conditional jump instruction. Set value; if it is determined that the record information of the conditional jump instruction does not exist, then add the record information of the conditional jump instruction.
  • the detection module after the detection module obtains the conditional jump instruction, it is also used to: determine whether the conditional jump instruction points to a small loop, where the small loop refers to the number of instructions executed in a single loop is less than the number of instructions executed in a single loop. Cycle through two preset values. If the conditional jump instruction points to a small loop, the conditional jump instruction is filtered out; if the conditional jump instruction does not point to a small loop, it is further determined whether there is record information of the conditional jump instruction.
  • the recording information is recorded in a preset cache of the computing resource.
  • the detection module when determining whether there is record information of a conditional jump instruction, is specifically used to: determine a jump identifier based on the identification information of the conditional jump instruction, where the jump identifier is a conditional jump instruction.
  • the identification information of the jump instruction, or the hash of the identification information of the conditional jump instruction. Traverse multiple record information in the cache. If it is determined that a certain record information includes the jump identifier, then it is determined that the record information of the conditional jump instruction is included in the cache; if it is determined that none of the multiple record information in the cache includes If this jump identifier is used, it is determined that the cache does not include the record information of the conditional jump instruction.
  • the identification information of the conditional jump instruction is obtained from the branch record module.
  • the identification information includes a starting position and/or a target position, or a hash of the starting position and/or a target position.
  • the detection module determines whether the number of times the first subroutine is repeatedly executed is greater than the third preset value based on the record information, it is specifically used to: based on the number of instructions and the execution length in the record information, Determine whether the number of times the first subroutine is repeatedly executed is greater than the third preset value, where the number of instructions in the record information is used to indicate the cumulative number of instructions executed by the program when the program last executed the conditional jump instruction; execution in the record information The length is the difference in the cumulative number of instructions executed by the program when the program executes the conditional jump instruction twice before.
  • the third preset value is equal to 2.
  • the detection module determines that the number of repeated executions of the first subroutine is greater than the third preset value based on the number of instructions and execution length in the record information, specifically Used to: use the number of instructions executed by the program when executing the conditional jump instruction as the first instruction number, and use the difference between the first instruction number and the number of instructions in the record information as the first execution length. If it is determined that the execution length in the record information is not 0, and the difference between the first execution length and the execution length in the record information is less than the difference threshold, it is determined that the number of times the first subroutine is repeatedly executed is greater than 2. In a possible implementation, after determining that the number of times the first subroutine is repeatedly executed is greater than a third preset value, the detection module is further configured to: update the record information according to the first instruction number and the first execution length.
  • the detection module is further configured to: determine the instruction number threshold based on the first instruction number and the preset execution length in the updated record information. When the number of instructions executed by the program reaches the instruction number threshold, if the conditional jump instruction has not been executed again, it is determined that the loop program has exited; the parameter determination module is also used to determine the configuration parameters of the computing resources as default configuration parameters.
  • the parameter determination module is specifically used to obtain the running characteristics of the computing resource running program when the program executes the conditional jump instruction when determining the running characteristics of the first subroutine in the running of the computing resource. Characteristics; and, obtain the operating characteristics of the computing resource running program when the program executes the conditional jump instruction once before executing the conditional jump instruction; determine a cycle of the computing resource running first subroutine based on the operating characteristics obtained twice. operating characteristics. The operating characteristics are determined by the parameter determination module based on the characteristic count value obtained from the characteristic counter of the performance monitoring unit.
  • the parameter determination module when determining the configuration parameters of the computing resources according to the operating characteristics, is specifically used to determine the target preset characteristics that match the operating characteristics from multiple preset characteristics; Assume that the preset configuration parameters corresponding to the characteristics are determined as the configuration parameters of the computing resources.
  • the configuration parameters of the computing resource include a prefetching strategy, which includes a prefetching strategy for missing cache lines, a prefetching strategy for integer data access, the aggressiveness of the prefetching algorithm, etc.
  • the parameter determination module determines a target preset feature that matches the operating feature from multiple preset features, it is specifically used to perform dimensionality reduction processing on the operating feature of dimension A to obtain B
  • a and B are both positive integers, and B is smaller than A.
  • the parameter determination module determines the degree of matching between the operating characteristics of the B dimension and the preset characteristics of any B dimension, it is specifically used to determine, for any dimension in the B dimension, the The degree of matching between the operating characteristics corresponding to the dimensions and the preset characteristics; according to the corresponding matching degree of each dimension in the B dimension, the matching degree between the operating characteristics of the B dimension and the preset characteristics of the B dimension is determined.
  • the dimension in one dimension of the preset feature of dimension B, the dimension includes multiple bits, and the values of some of the bits in the multiple bits are masked.
  • the parameter determination module may be based on Fuzzy matching method is determined.
  • the parameter determination module if the parameter determination module does not determine a target preset feature that matches the operating feature from multiple preset features, the default configuration parameter is used as the configuration parameter of the computing resource.
  • the configuration parameters of the computing resource include the address and configuration register value of the configuration register in the computing resource; when the configuration module configures the configuration parameters of the computing resource to the computing resource, it is specifically used to: The value is written to the configuration register corresponding to the address of the configuration register.
  • the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core in each clock cycle, the instruction conversion lookaside buffer loss rate, cache miss rate, prefetching Hit rate.
  • the present application provides a processing device, including a computing resource and a memory connected to the computing resource.
  • the memory is used to store a computer program
  • the computing resource is used to execute the computer program stored in the memory, so that the computing resource implements the above-mentioned first step.
  • the present application provides a computer-readable storage medium.
  • Computer programs or instructions are stored in the computer-readable storage medium.
  • the computer programs or instructions are executed by computing resources in the processing device, the above-mentioned first aspect or the first aspect is realized. method in any possible implementation of the aspect.
  • the application provides a processing chip, including at least one processor core and an interface; the interface is used to provide program instructions or data to at least one processor core; and the at least one processor core is used to execute program line instructions to Implement the method executed by computing resources in the above first aspect or any possible implementation manner of the first aspect.
  • Figure 1 is a schematic structural diagram of a processing equipment
  • Figure 3 is a schematic flow chart of a method of running a program provided by this application.
  • Figure 7 is a schematic diagram of the correspondence between a preset feature and a preset configuration parameter provided by this application.
  • the number of processors 101 may be multiple, and the multiple processors 101 may include multiple processors of the same type, or may include multiple processors of different types.
  • multiple processors 101 are for multiple CPUs.
  • the plurality of processors 101 include one or more CPUs and one or more GPUs.
  • the plurality of processors 101 include one or more CPUs and one or more NPUs.
  • the plurality of processors 101 include one or more CPUs, one or more GPUs, one or more NPUs, and the like.
  • the processor 101 may include one physical core (physical core/processor) or multiple physical cores.
  • the physical core is the real processor core that can be seen inside the processor.
  • the physical core of the processor can be referred to as the processor core for short.
  • FIG. 2 is a schematic diagram of the internal structure of a CPU core 20 provided by this application.
  • the CPU core 20 includes a microinstruction (micro-ops/uops) module 201, a branch recording module 202, a performance monitoring module 203 and a register 204.
  • microinstruction micro-ops/uops
  • the microinstruction module 201 is used to detect and save the microinstruction loop sequence.
  • the microinstruction loop sequence is less than or equal to the capacity of the microinstruction module 201, it can be stored in the microinstruction module 201, and there is no need to go through the front end.
  • the microinstruction module 201 is, for example, a loop stream detector (LSD).
  • the performance monitoring module 203 includes one or more counters, which can track and count some underlying hardware events, such as events related to the CPU core 20 (number of executed instructions, number of exceptions captured, number of clock cycles, etc.), and cache ( cache) related events (number of L1/L2 cache accesses, number of misses, etc.) and events related to the translation lookaside buffer (TLB), etc. These events reflect the behavior of the program during execution and can be used to analyze and tune the program.
  • the performance monitoring module 203 is, for example, a performance monitoring unit (performance monitoring unit, PMU).
  • Memory 102 refers to a device for storing data, which may be a memory or a hard disk.
  • the hard disk is used to provide storage resources, such as data for storing programs, such as pictures, videos, audios, text and other data.
  • Hard drives include, but are not limited to: non-volatile memory (non-volatile memory), such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (solid state disk, SSD) wait.
  • non-volatile memory such as read-only memory (ROM), hard disk drive (HDD) or solid state drive (solid state disk, SSD) wait.
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the data, program instructions, etc. in the hard disk need to be loaded into the memory first, and then the processor obtains the data and/or program instructions from the memory.
  • Communication interface 103 used for communicating with other devices.
  • the computing resources in the processing device are configured as default configuration parameters, where the computing resources are, for example, processors or cores in the processor, and the configuration parameters refer to the parameters used by the computing resources when running the program.
  • configuration parameters include prefetching strategies, which include prefetching strategies for missing cache lines, prefetching strategies for integer data access, and the aggressiveness of prefetching algorithms (such as passive (passive) strategy or active (aggressive) strategy), etc.
  • the running characteristics of the program include the computing characteristics of the program, the memory access characteristics of the program, etc.
  • the running characteristics of the program can be represented by a variety of micro-architectural features.
  • the multiple micro-architectural features include one or more of the following: the number of instructions executed by the processor core in each clock cycle, instruction conversion Lookaside buffer (instruction translation lookaside buffer, iTLB) loss rate, cache (cache) loss rate, prefetch hit rate, data translation lookaside buffer (dTLB) loss rate, etc.
  • iTLB can also be called instruction list cache, instruction redirection bypass cache, address translation cache, etc.
  • dTLB can also be called data list cache, data redirection bypass cache, data translation cache, etc.
  • operating characteristics are only illustrative. In other cases, operating characteristics include, in addition to the number of instructions executed by the processor core in each clock cycle, iTLB miss rate, cache miss rate, and prefetch hits. In addition to one or more of the rate and dTLB loss rate, other operating characteristics may also be included; or, the operating characteristics do not include the number of instructions executed per clock cycle of the processor core, the iTLB loss rate, cache misses rate, prefetch hit rate, dTLB loss rate, and other operational characteristics. This application does not limit specific operating characteristics.
  • the present application provides a method of running a program, the method of running the program being executed by a processing device or a computing resource in the processing device.
  • a program running in the computing resources there is a program running in the computing resources.
  • the computing resources obtain the operating characteristics of the subroutines in the program, and adjust the configuration parameters of the computing resources according to the obtained operating characteristics, so that the computing resources can run the program efficiently.
  • FIG. 3 is a schematic flowchart of a method of running a program provided by this application. The explanation is as follows with reference to Figure 3:
  • Step 301 The computing resource determines that the program running in the computing resource includes multiple similar subprograms.
  • the multiple segments of similar subroutines are multiple segments of subroutines whose similarity in operating characteristics is greater than or equal to the first preset value.
  • the following example provider includes representations of multiple similar subroutines:
  • the processing device or computing resource performs static analysis on multiple program segments in advance to obtain static analysis results corresponding to the multiple program segments.
  • Each static analysis result may include the operating characteristics of each program segment.
  • the processing device or computing resource determines that the multiple program fragments have similar operating characteristics based on the operating characteristics of each of the multiple program fragments.
  • the computing resource can directly determine that the program includes the multiple similar program fragments based on the above-mentioned pre-analysis results.
  • the processing device or computing resource pre-runs the multiple program fragments and obtains the operating characteristics corresponding to each program fragment, and then determines that the multiple program fragments have similar operating characteristics based on the operating characteristics of the multiple program fragments.
  • the computing resource runs the program again, it can directly determine that the program includes the multiple similar program fragments based on the above-mentioned pre-analysis results.
  • multiple similar subroutines are loop programs that are executed multiple times in the program, where each subroutine corresponds to one or more cycles of the loop program, that is, the loop program in multiple loops.
  • the similarity of the operating characteristics is greater than or equal to the first preset value.
  • the running program corresponds to an instruction stream, and the instruction stream includes instruction stream segments (i.e., subroutines) corresponding to the loop program in multiple loops, and the similarity of the operating characteristics of the multiple instruction stream segments is Greater than or equal to the first preset value.
  • the computing resource when the computing resource is executing the program, if it is determined that the program includes a loop program that is executed cyclically, it is determined that the program being run by the computing resource includes multiple similar subroutines. Among them, the computing resource determines whether there is a loop program (denoted as the first subroutine) in the program that needs to be executed in a loop.
  • a loop program denoted as the first subroutine
  • Step 302 The computing resource obtains the operating characteristics of the subroutine that has been run or is currently running among multiple similar subroutines.
  • the computing resource can obtain the operating characteristics of the subprograms that have been run or are currently running from the multiple similar subprograms, and use the determined The operating characteristics are the operating characteristics of multiple similar subprograms.
  • the computing resource obtains the running characteristics of the executed or currently running program fragments among the multiple similar program fragments, and compares the executed or currently running program fragments to The running characteristics of a fragment are used as the running characteristics of a program fragment that has not yet been executed among multiple similar program fragments.
  • multiple similar program fragments are program fragment 1 to program fragment 10.
  • the computing resource has completed running program fragment 1 and program fragment 2 but has not yet run program fragment 3 to program fragment 10.
  • the computing resource determines the execution of program fragment 1 and program fragment 2. Characteristics, and use the determined operating characteristics as the operating characteristics of program fragment 3 to program fragment 10.
  • the computing resource obtains the running characteristics of the loop process that has been run or is currently running, and the determined running characteristics are As a running characteristic during a loop that has not yet been run.
  • the number of loops for a computing resource to execute a subroutine is 10.
  • the computing resource determines the execution of the subroutine during the first loop that has been run or the second loop that is currently running. Characteristics, and use the determined operating characteristics as the operating characteristics of the subroutine in the process from the 3rd to the 10th cycle.
  • computing resources obtain operational characteristics. For example, multiple feature counters are provided in the computing resource. For each feature counter, the computing resource determines the feature count value corresponding to the feature counter before and after the execution of the subroutine, and then determines the feature count value corresponding to the subroutine. Computing resources determine multiple feature counters The multiple characteristic count values corresponding to the subroutine constitute the operating characteristics of the subroutine.
  • the computing resource includes three characteristic counters (denoted as characteristic counter 1 to characteristic counter 3).
  • Characteristic counter 1 is used to record the number of instructions executed by the processor core in each clock cycle
  • characteristic counter 2 is used to record the number of instructions executed by the processor core in each clock cycle.
  • feature counter 3 is used to record the number of misses in the instruction conversion lookaside buffer.
  • Example 1 when the computing resource starts executing the subroutine, the feature count values of feature counter 1 to feature counter 3 are all 0; when the computing resource finishes executing the subroutine, the feature count values of feature counter 1 to feature counter 3 are respectively 100, 200, 20, then the operating characteristics of the subroutine include 100, 200, 20.
  • Example 2 When the computing resource starts executing the subroutine, the feature count values of feature counter 1 to feature counter 3 are 100, 200, and 20 respectively; when the computing resource completes executing the subroutine, the feature count values of feature counter 1 to feature counter 3 are The count values are 201, 403, and 40 respectively, then the operating characteristics of this subroutine include 101, 203, and 20.
  • Step 303 The computing resource determines the configuration parameters of the computing resource based on the determined operating characteristics.
  • the configuration parameters of the computing resource are the parameters that need to be configured in the computing resource.
  • the configuration parameters of the computing resource can be called target configuration parameters of the computing resource.
  • the configuration parameters of the computing resources are the configuration parameters that are most suitable for the operating characteristics of the currently running program. When the computing resources adopt this configuration parameter, the running effect of the program is optimal. In this way, the computing resources can also be Configuration parameters are called optimal configuration parameters of computing resources.
  • Example 1 Multiple preset features are preset in the computing resource, as well as preset configuration parameters corresponding to the multiple preset features.
  • preset features 1 to 1000 are preset in the computing resource, and preset configuration parameters 1 to 1000 corresponding to the preset features 1 to 1000 respectively.
  • the computing resource determines the target preset feature that matches the operating feature from multiple preset features, and uses the preset configuration parameter corresponding to the target preset feature as the target configuration parameter of the computing resource. Based on the above example, when the computing resource determines from the preset characteristics 1 to 1000 that the target preset characteristic that matches the operating characteristics is the preset characteristic 10, then the computing resource uses the preset configuration parameter 10 as the target configuration parameter of the computing resource.
  • multiple preset features are preset in the computing resource, as well as the addresses of the preset configuration parameters corresponding to the multiple preset features.
  • the computing resource uses a ternary content addressable memory (TCAM) to store multiple preset features and addresses of preset configuration parameters corresponding to the multiple preset features.
  • TCAM ternary content addressable memory
  • preset features 1 to 1000 are preset in the computing resource, and addresses 1 to 1000 of the preset configuration parameters corresponding to the preset features 1 to 1000 respectively.
  • the computing resource determines the target preset feature that matches the running feature from multiple preset features, reads the preset configuration parameter from the address of the preset configuration parameter corresponding to the target preset feature, and converts the read preset configuration Parameters serve as target configuration parameters for computing resources.
  • the computing resource determines from the preset characteristics 1 to 1000 that the target preset characteristic that matches the running characteristics is the preset characteristic 10, then the computing resource reads the preset configuration parameter from address 10, and reads the preset configuration parameter.
  • the obtained preset configuration parameters are used as the target configuration parameters of the computing resources.
  • Example 2 also correspond to respective preset configuration parameters.
  • the corresponding relationship between the preset characteristics and the preset configuration parameters in the processing device can also be set based on expert experience.
  • Step 1 normalization processing:
  • Step 3 feature matching:
  • the value of the dimension can be expressed in binary, that is, the dimension includes multiple bits, and each bit in the multiple bits The bits correspond to their respective values.
  • the dimensions include 10111 01100 (that is, 10 bits).
  • the computing resources can be determined based on fuzzy matching. Still combined with the above example, the value of the dimension after a certain mask in the preset feature is 10111 01***, and the value of this dimension in the running feature is 10111 00111. Then the " The degree of matching between "10111 01" and "10111 00".
  • the dimensionality reduction process is optional.
  • the computing resources do not reduce the dimensionality of the preset features and operating features, then when the features are matched, the computing resources determine the operating features of the A dimension and compare them with multiple A dimensions. The degree of matching between the preset features of the dimension, and then the preset feature with the highest matching degree is selected from multiple preset features of the A dimension as the target preset feature.
  • the computing resources can also mask some of the bits in the dimension for each of the A dimensions of the preset feature. For details, please refer to the above-mentioned method of masking the preset feature. Some bits in each of the B dimensions will not be described again here.
  • the computing resource can use the default configuration parameters as the configuration parameters of the computing resource.
  • the computing resource also sets a general preset feature. All the features in the general preset feature are "*", that is, the values of each dimension in the general preset feature are all "*”. Let features correspond to default configuration parameters.
  • the general preset feature and the default configuration parameters corresponding to the general preset feature may be stored in the TCAM.
  • the computing resource determines whether to adjust the configuration parameters of the computing resource based on the target configuration parameters and the current configuration parameters of the computing resource. Specifically, if the computing resource determines that the current configuration parameters are consistent with the target configuration parameters, no processing will be performed; if the computing resource determines that the current configuration parameters are inconsistent with the target configuration parameters, the current configuration parameters of the computing resources will be adjusted to the target configuration parameters.
  • the target configuration parameters include a configuration register address and a configuration register value corresponding to the configuration register address.
  • the configuration register value may be written into to the configuration register corresponding to the configuration register address.
  • the target configuration parameters include configuration register values corresponding to multiple configuration register addresses.
  • the target configuration parameters include address a and value a, address b and value b, address c and value c.
  • the computing resource writes value a into the configuration register corresponding to address a, and writes value b into the configuration register corresponding to address b.
  • computing resources can be used to improve program performance by adjusting the configuration parameters of computing resources (such as reducing the aggressiveness of the prefetch algorithm and reducing the original aggressive strategy to a passive strategy).
  • computing resources can be used to improve program performance by adjusting the configuration parameters of computing resources (such as increasing the aggressiveness of the prefetch algorithm and upgrading the original passive strategy to an aggressive strategy).
  • Figure 4 provides a schematic flow chart of a computing resource determination cycle program.
  • the jump means that the computing resource executes the second instruction after executing the first instruction.
  • the first instruction and the second instruction are two discontinuous instructions in a section of program code.
  • the first instruction is before the second instruction, or the first instruction is after the second instruction.
  • the first instruction is located at Line 1, the second instruction is on line 10.
  • the computing resource executes the conditional jump instruction. Specifically, the computing resource reads the conditional jump instruction from the memory and executes the conditional jump instruction while running the program.
  • Memories include memory, high bandwidth memory (HBM) and non-volatile memory. Optionally, the memory is included in the processing device.
  • the computing resource executes the first subroutine, if the conditional jump instruction is not executed, the computing resource determines that the first subroutine is not a loop program.
  • computing resources execute instructions (such as conditional jump instructions) while running a program, it is equivalent to computing resources executing instructions, or program execution instructions.
  • Step 402 The computing resource determines the identification information of the conditional jump instruction.
  • the identification information of the conditional jump instruction is the address information of the conditional jump instruction, or a hash of the address information of the conditional jump instruction, or a mapping of the address information of the conditional jump instruction, etc.
  • the address information of the conditional jump instruction includes the starting position and/or the target position of the conditional jump instruction. Based on the example in step 401, the starting position of the conditional jump instruction is the first instruction and the target position is the second instruction.
  • the computing resource obtains the identification information of the conditional jump instruction from the branch record module.
  • the computing resource also determines the first instruction number.
  • the first instruction number is the total number of instructions that the program has accumulated when the computing resource executes the conditional jump instruction (that is, the number of instructions).
  • the performance monitoring module includes an instruction counter, and the computing resource specifically reads the first number of instructions from the instruction counter of the performance monitoring module. Based on whether the instruction counter has been reset, two examples are described as follows: Example a, after the program starts running, if the instruction counter has not been reset, then the first instruction number is the total number of cumulative instructions executed by the program. Example b, after the program starts running, if the instruction counter is reset, then the first instruction number is the total number of instructions executed by the program after the last instruction counter reset.
  • the computing resource also determines the first number of executions.
  • the first number of executions is the total number of times the computing resource executes the conditional jump instruction.
  • the performance monitoring module also includes an execution count counter.
  • the computing resources can also read the first execution count from the execution count counter of the performance monitoring module. For details, see Computing Resources From the Performance Monitoring Module Implementation of reading the first instruction number from the block's instruction counter.
  • the computing resource determines that the subroutine it is executing is a loop program. It can also be considered that the computing resource determines that the program it is executing is in a loop state. The computing resource determines that the loop program it is executing is about to exit/has exited the loop. It can also be considered that the computing resource determines that the program it is executing is about to exit/has exited the loop state.
  • the following takes the record information of any conditional jump instruction in the cache as an example to explain each field in the record information and the update method of the record information.
  • Jump identification determined based on the identification information of the conditional jump instruction.
  • the jump identifier is the identification information of the conditional jump instruction, or the jump identifier is a hash of the identification information of the conditional jump instruction.
  • the identification information of the conditional jump instruction please refer to the description in step 402 above.
  • the execution length in the record information is that the computing resources are determined based on the total number of cumulative execution instructions of the program between the conditional jump instruction executed for the Kth time and the conditional jump instruction executed for the K-1th time.
  • the execution length is the total number of cumulative execution instructions of the program between the two consecutive executions of the same conditional jump instruction.
  • the execution length in the record information is specifically the total number of cumulative execution instructions of the program between the K-th conditional jump instruction executed by the computing resource and the K-kth conditional jump instruction executed. For example, if the computing resource is between the 10th execution of the conditional jump instruction and the 5th execution of the conditional jump instruction, and the total number of cumulative execution instructions of the program is 500, then the execution length in the record information is specifically 500.
  • the execution length in the record information specifically refers to all the conditional jump instructions executed twice adjacently between the conditional jump instruction executed at the Kth time and the conditional jump instruction executed at the Kkth time. the average.
  • the computing resources are between the 10th execution of the conditional jump instruction and the 5th execution of the conditional jump instruction.
  • the total number of cumulative executed instructions is 500, so the execution length in the record information is specifically 100.
  • the computing resource When the computing resource is preparing to write the record information of the conditional jump instruction into the cache, it is determined in advance whether the record information of the conditional jump instruction already exists in the current cache. If so, the record information to be written is updated to the original one. in the record information; otherwise, write the record information to be written (that is, a new record information) in the cache.
  • conditional jump instruction instructs the program to jump from instruction 10 to instruction 7.
  • identification information of the conditional jump instruction includes the starting position "instruction 10" and the target position "instruction 7".
  • the computing resources are based on the starting position.
  • the starting position "Instruction 10" and the target position "Instruction 7" are determined to be hash(10-7).
  • the computing resource executes the conditional jump instruction for the nth time (where n is greater than 2), it determines that the cache contains the record information of the conditional jump instruction based on the jump identifier "hash(10-7)". Further, the computing resource determines that the total number of cumulative execution instructions of the program when the conditional jump instruction is executed for the nth time is Between instructions, the total number of cumulative execution instructions of the program is Y. The computing resource updates "hash(10-7), X and Y" to the record information corresponding to the existing conditional jump instruction in the cache.
  • the computing resource when the computing resource writes the record information of the conditional jump instruction into the cache, it can include the record information of the non-cyclic jump instruction (such as the conditional jump instruction corresponding to if else) and the two same conditional jump instructions.
  • the record information between other conditional jump instructions is filtered out.
  • Step 2 insert hash(10-7), X2, Y2;
  • Step 12 update hash(12-3), X12, Y12, and delete hash(10-7), X11, Y11;
  • Step 14 update hash(10-7), X14, Y14;
  • Step 15 update hash(10-7), X15, Y15;
  • the computing resource determines whether the first subroutine is a loop program based on the record information of the conditional jump instruction in the cache, and updates the record of the conditional jump instruction in the cache. information. If the computing resource determines that the record information of the conditional jump instruction does not exist in the cache, the record information of the conditional jump instruction is added to the cache.
  • Case 1 The computing resource determines that the cache contains the record information of the conditional jump instruction.
  • the computing resource determines that the number of times the first subroutine is repeatedly executed by the computing resource is greater than the third preset value based on the record information of the conditional jump instruction that already exists in the cache, and further determines that the first program is a loop program.
  • the record information of the conditional jump instruction already exists in the cache, indicating that the first subroutine is repeatedly executed by the computing resource, and the computing resource determines that the number of times the first subroutine is repeatedly executed by the computing resource is greater than the third preset value, thereby determining that the first program is a loop program, in which the third preset value is 1.
  • the record information of the conditional jump instruction that already exists in the cache includes the jump identifier and the number of executions of the conditional jump instruction. If the computing resource determines that the number of executions of the conditional jump instruction is greater than the third If the preset value is set, it is determined that the first program is a loop program, in which the value of the third preset value is, for example, 2.
  • the record information of the conditional jump instruction that already exists in the cache includes a jump identifier, a pointer The number of commands (recorded as the second instruction number) and the execution length (recorded as the second execution length).
  • the second number of instructions is the total number of instructions accumulated by the program when the computing resource executes the same conditional jump instruction once before executing the conditional jump instruction (ie, step 401).
  • the explanation is that the computing resource executes the conditional jump instruction multiple times.
  • the current execution i.e., step 401 is the Nth time.
  • the second instruction number can be the computing resource executing the conditional jump for the N-1th time. The total number of instructions executed by the program during the instruction period.
  • the second execution length is the total number of cumulative execution instructions of the program between the two executions of the same conditional jump instruction by the computing resource before executing the conditional jump instruction (ie step 401).
  • the explanation is that the computing resource executes the conditional jump instruction multiple times. For example, the current execution (i.e. step 401) is the Nth time, then the second execution length is the N-1th time the computing resource executes the conditional jump instruction. Between the N-2nd execution of conditional jump instructions, the program accumulates the total number of executed instructions.
  • the computing resource determines that the first program is a loop program based on the second instruction number and the second execution length. For details, please refer to possible method 1 or possible method 2 below.
  • the computing resource determines that the second execution length is 0 (that is, the record information is recorded when the computing resource executes the conditional jump instruction for the first time), it determines that the first subroutine is a loop program.
  • the computing resource determines that the second execution length is not 0 (that is, the record information is recorded when the computing resource executes the conditional jump instruction for the mth time, m is greater than 1), then the second instruction number and the first instruction number are The difference is used as the first execution length.
  • the computing resource determines that the first subroutine is a loop program when it is determined that the difference between the first execution length and the second execution length is less than the difference threshold.
  • the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, the computing resource determines that the first subroutine is not a loop program.
  • the computing resource determines that the second execution length is 0 (that is, the record information is recorded when the computing resource executes the conditional jump instruction for the first time), it determines that the first subroutine is not a loop program.
  • the computing resource determines that the second execution length is not 0 (that is, the record information is recorded when the computing resource executes the conditional jump instruction for the mth time, m is greater than 1), then based on the second instruction number and the first instruction number, Determine the first execution length. When it is determined that the difference between the first execution length and the second execution length is less than the difference threshold, it is determined that the first subroutine is a loop program. On the contrary, when the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, the computing resource determines that the first subroutine is not a loop program.
  • the computing resource also updates the record information of the conditional jump instruction in the cache according to one or more of the jump identifier of the conditional jump instruction, the first instruction number, the first execution length, and the first number of executions.
  • the computing resource updates the first instruction number and the first execution length to the record information corresponding to the conditional jump instruction in the cache according to the jump identifier of the conditional jump instruction, that is, to the record information corresponding to the conditional jump instruction.
  • the second instruction number in the record information is updated to the first instruction number, and the second execution length in the record information corresponding to the conditional jump instruction is updated to the first execution length.
  • the computing resource can also update the number of executions in the record information first, and when the number of executions after the update is greater than the third preset value, determine the first The number of times the subprogram is repeatedly executed by the computing resource is greater than the third preset value, thereby determining that the first program is a loop program.
  • Case 2 The computing resource determines that the record information of the conditional jump instruction is not included in the cache.
  • the computing resource determines that the first subroutine is not a loop. Further, the computing resource also adds record information of the conditional jump instruction in the cache according to one or more of the jump identifier of the conditional jump instruction, the first instruction number, and the first execution length. Exemplarily, the computing resource adds one or more of the jump identifier, the first instruction number, the first execution length, and the first execution times of the conditional jump instruction as the record information of the conditional jump instruction to In cache. For example, the first execution length takes a value of 0, and the first execution times takes a value of 1.
  • the computing resource determines whether the cache contains the record information of the conditional jump instruction. It can also be understood that the computing resource attempts to obtain the record information of the conditional jump instruction from the cache, and then based on whether the conditional jump instruction is obtained.
  • the record information of the transfer instruction determines whether the first subroutine is a loop program. Specifically, if the computing resource can obtain the record information of the conditional jump instruction from the cache, it will determine that the number of times the first subroutine is repeatedly executed by the computing resource is greater than the third time based on the obtained record information of the conditional jump instruction. The default value determines that the first program is a loop program. If the computing resource fails to obtain the record information of the conditional jump instruction from the cache, it determines that the first subroutine is not a loop program.
  • Figure 5 is a schematic flow chart of a specific implementation method of the computing resource determination loop program provided by this application. It can be understood that Figure 5 is a specific implementation method of Figure 4 .
  • Figure 5 please refer to the description in the above-mentioned related embodiment of Figure 4.
  • Step 501 After executing the first subroutine, the computing resource executes the conditional jump instruction.
  • Step 502 The computing resource determines whether the cache contains the record information of the conditional jump instruction based on the jump identifier of the conditional jump instruction. Specifically, if the computing resource determines that the cache does not include the record information of the conditional jump instruction, step 503 will be executed. If the computing resource determines that the cache contains the record information of the conditional jump instruction, step 504 will be executed.
  • Step 503 The computing resource adds the record information of the conditional jump instruction in the cache.
  • Step 504 The computing resource determines whether the second execution length is 0. Specifically, if the computing resource determines that the second execution length is 0, step 505 is executed. If the computing resource determines that the second execution length is not 0, step 506 is executed.
  • Step 505 The computing resource updates the record information of the conditional jump instruction in the cache and determines that the first subroutine is a loop program.
  • Step 506 The computing resource determines whether the difference between the first execution length and the second execution length is less than a difference threshold. Specifically, if the computing resource determines that the difference between the first execution length and the second execution length is less than the difference threshold, step 505 is executed; if the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, then step 507 is executed.
  • Step 507 The computing resource updates the record information of the conditional jump instruction in the cache to determine that the first subroutine is not a loop program.
  • Figure 6 is a schematic flowchart of a specific implementation of a computing resource determination loop program for this application. It can be understood that Figure 6 is another specific implementation of Figure 4 Way.
  • Figure 6 please refer to the description in the above-mentioned related embodiment of Figure 4.
  • Step 601 After executing the first subroutine, the computing resource executes the conditional jump instruction.
  • Step 602 The computing resource determines whether the cache contains record information of the conditional jump instruction based on the jump identifier.
  • step 603 if the computing resource determines that the cache does not include the record information of the conditional jump instruction, step 603 will be executed. If the computing resource determines that the cache contains the record information of the conditional jump instruction, then step 604 will be executed.
  • Step 603 The computing resource adds the record information of the conditional jump instruction in the cache.
  • Step 604 The computing resource determines whether the second execution length is 0.
  • step 605 is executed. If the computing resource determines that the second execution length is not 0, step 606 is executed.
  • Step 605 The computing resource updates the record information of the conditional jump instruction in the cache.
  • Step 606 The computing resource determines whether the difference between the first execution length and the second execution length is less than a difference threshold. Specifically, if the computing resource determines that the difference between the first execution length and the second execution length is less than the difference threshold, step 608 is executed; if the computing resource determines that the difference between the first execution length and the second execution length is greater than or equal to the difference threshold, then step 607 is executed.
  • Step 608 The computing resource updates the record information of the conditional jump instruction in the cache and determines that the first subroutine is a loop program.
  • the computing resource can also first determine whether the conditional jump instruction points to a small loop, where the small loop refers to the number of instructions executed in a single loop is less than the number of instructions executed in a single loop. Cycle through two preset values.
  • the computing resource can determine whether the program is in a small loop state according to instructions in the microinstruction module (such as LSD), and then determine whether the conditional jump instruction points to a small loop. If the computing resource determines that the conditional jump instruction points to a small loop, the conditional jump instruction will be filtered out, and there is no need to perform the above steps 402 and 403.
  • the program is in a loop state, and the number of instructions in a single cycle corresponding to the loop state is greater than the second preset value. In this way, it helps to reduce the amount of data recorded in the cache for conditional jump instructions, thereby saving cache space for computing resources. In addition, it also helps avoid frequent adjustment of configuration parameters of computing resources and reduce computing power consumption of computing resources.
  • computing resources also need to identify as early as possible that the program has/is about to exit the loop (such as ending loop execution) subroutine) to avoid that when the program has exited the loop, the computing resource still uses the configuration parameters applicable to the loop to run the program. The following still takes the first subroutine in the program as an example.
  • the instruction number threshold can also be determined based on the first instruction number and the preset execution length.
  • the instruction number threshold is equal to the sum of the first instruction number and the preset execution length.
  • the preset execution length is set in advance, and the preset execution length is greater than the first execution length and also greater than the second execution length.
  • the preset execution length is the first execution length or the second execution length.
  • the computing resource writes the jump identifier of the conditional jump instruction and the instruction count threshold into the cache together, so that the computing resource can determine whether the program exits the loop based on the jump identifier and the instruction count threshold in the cache, that is, The first subroutine is no longer executed in a loop.
  • the computing resource writes the instruction number threshold into the record information corresponding to the jump identifier (ie, the record information in step 403).
  • the record information corresponding to the jump identifier includes not only jumps but One or more of the number of instructions, execution length, and number of executions corresponding to the identifier, and also includes the instruction number threshold corresponding to the jump identifier.
  • the computing resource writes the jump identifier of the conditional jump instruction and the instruction number threshold as a separate piece of record information into the cache.
  • the computing resource determines that a new conditional jump instruction has not been executed again when the total number of cumulative execution instructions of the program exceeds the instruction number threshold, it determines that the program exits the loop, that is, the first subroutine is no longer executed cyclically. If the computing resource determines to execute a new conditional jump instruction again before the total number of cumulative execution instructions of the program exceeds the instruction number threshold, it determines that the program is still in the loop, that is, the first subroutine is still being executed in a loop.
  • the computing resource determines that the first subroutine is a loop program, it can also write the preset execution length (or the preset number of instructions) into the preset register. Each time the computing resource executes an instruction , indicating that the number of instructions in the preset register is reduced by 1. In this way, when the value of the instruction number in the preset register is 0, if the computing resource has not executed a new conditional jump instruction again, it is determined that the program exits the loop, that is, the first subroutine is no longer executed in a loop.
  • the computing resource executes a new conditional jump instruction again before the instruction number in the preset register reaches 0, it is determined that the program is still in the loop, that is, the first subroutine is still being executed in a loop. Furthermore, each time the computing resource executes the conditional jump instruction, the preset instruction number can be refreshed into the preset register.
  • the computing resource further determines that the configuration parameters of the computing resource are default configuration parameters.
  • the computing resource determines that the current configuration parameter is the default configuration parameter, it does not adjust the current configuration parameter; when the computing resource determines that the current configuration parameter is not the default configuration parameter, it adjusts the current configuration parameter. are the default configuration parameters.
  • the computing resource after the computing resource determines that the program is still executing the first subroutine in a loop, in one example, the computing resource also obtains the running characteristics of the first subroutine, and determines the target configuration parameters of the computing resource based on the running characteristics; in another example , during the process of cyclically executing the first subroutine, the computing resource can no longer detect the operating characteristics of the first subroutine in each cycle, but after determining that the first subroutine is no longer cyclically executed, adjust the configuration parameters of the computing resource to Default configuration parameters, this latter example helps reduce the power consumption or complexity of the computing resources running the program.
  • the processor may include multiple processor cores.
  • the processor core executes the method in the above method embodiment to determine the target configuration parameters of the processor core, that is, the target configuration parameters are at the processor core granularity. .
  • the processor core then configures the target configuration parameters in the processor core.
  • the computing resources include processor cores 1 to 5 .
  • the processor core 1 When the processor core 1 is running the program, the processor core 1 also executes the method in the above method embodiment to determine the target configuration parameter 1 of the processor core 1 . The processor core 1 then configures the target configuration parameter 1 in the processor core 1 .
  • other processor cores can also determine their own target configuration parameters based on the programs they are running.
  • multiple processor cores run the same program, and one processor core is used to execute the method in the above method embodiment to determine the target configuration parameters corresponding to the same program, that is, the target configuration parameters are program granular.
  • the processor core used to determine the target configuration parameters may be one of multiple processor cores running the program, or may be another processor core in the computing resource that is independent of the multiple processor cores.
  • the processor core can further configure the target configuration parameters in the plurality of processor cores respectively.
  • the computing resources include processor cores 1 to 5 .
  • processor cores 1 to 5 run the same program, and during the process of running the program, processor core 1 also executes the method in the above method embodiment to determine the target configuration parameters corresponding to the same program, And the target configuration parameters corresponding to the same program are configured in the processor cores 1 to 5.
  • processor cores 2 to 5 run the same program, and processor core 1 executes the method in the above method embodiment to determine the target configuration parameters corresponding to the same program, and configure the target configuration corresponding to the same program. Parameters are configured in the processor cores 2 to 5.
  • FIG. 8 is a schematic structural diagram of a possible processing device provided by the present application.
  • the processing equipment can be used to implement the functions of the above method embodiments, and therefore has the beneficial effects of the above method embodiments.
  • the processing device 800 includes a parameter determination module 801 and a configuration module 802.
  • the device for running a program includes a parameter determination module 801 and a configuration module 802.
  • the parameter determination module 801 is used to obtain the running characteristics of the subprograms that have been run or are currently running in the multiple similar subprograms when it is determined that the program being run by the computing resource in the processing device includes multiple similar subprograms, wherein the multiple similar subprograms are Similar subroutines are multiple subroutines whose operating characteristic similarity is greater than or equal to the first preset value. And, determine the configuration parameters of the computing resources based on the operating characteristics.
  • the configuration module 802 is used to configure computing resources using the determined configuration parameters.
  • multiple similar subprograms are loop programs in the program that are executed multiple times, and each subprogram corresponds to one or more cycles of the loop program.
  • the number of instructions of the loop program is greater than the second preset value.
  • the device also includes a detection module 803, which is used to determine that the program running in the computing resource includes multiple similar subprograms. Specifically, when the detection module 803 determines that the number of times the first subroutine in the program is repeatedly executed is greater than the third preset value, it determines that the first subroutine is a loop program.
  • the detection module 803 when determining whether the number of times the first subroutine in the program is repeatedly executed is greater than the third preset value, is specifically used to: after completing the execution of the first subroutine and executing a conditional jump After the transfer instruction, determine whether there is record information of the conditional jump instruction; if it is determined that there is record information of the conditional jump instruction, determine whether the number of times the first subroutine is repeatedly executed is greater than the number of times the third subroutine is executed based on the record information of the conditional jump instruction. Default value; if it is determined that the record information of the conditional jump instruction does not exist, then the record information of the conditional jump instruction is added.
  • the detection module 803 executes the conditional jump instruction, it is also used to: determine whether the conditional jump instruction points to a small loop, where the small loop refers to the number of instructions executed in a single loop is less than the number of instructions executed in a single loop. Cycle through two preset values. If the conditional jump instruction points to a small loop, the conditional jump instruction is filtered out; if the conditional jump instruction does not point to a small loop, it is further determined whether there is record information of the conditional jump instruction.
  • the recording information is recorded in a preset cache of the computing resource.
  • the detection module 803 when determining whether there is record information of a conditional jump instruction, is specifically configured to: determine a jump identifier based on the identification information of the conditional jump instruction, where the jump identifier is a conditional jump instruction.
  • the identification information of the jump instruction, or the hash of the identification information of the conditional jump instruction. Traverse multiple record information in the cache. If it is determined that a certain record information includes the jump identifier, then it is determined that the record information of the conditional jump instruction is included in the cache; if it is determined that none of the multiple record information in the cache includes If this jump identifier is used, it is determined that the cache does not include the record information of the conditional jump instruction.
  • the identification information of the conditional jump instruction is obtained from the branch record module.
  • the identification information includes a starting position and/or a target position, or a hash of the starting position and/or a target position.
  • the detection module 803 determines based on the record information that the number of times the first subroutine has been repeatedly executed is greater than the third preset value, it is specifically configured to: based on the number of instructions and the execution length in the record information, It is determined that the number of times the first subroutine is repeatedly executed is greater than the third preset value, wherein the number of instructions in the record information is used to indicate the cumulative number of instructions executed by the program when the conditional jump instruction was last executed; the execution length in the record information is The difference in the number of instructions executed cumulatively when the program executes conditional jump instructions the first two times.
  • the third preset value is equal to 2.
  • the detection module 803 determines that the number of times the first subroutine has been repeatedly executed is greater than the third preset value based on the number of instructions and the execution length in the record information, Specifically, it is used to: use the number of instructions executed by the program when executing the conditional jump instruction as the first instruction number, and use the difference between the first instruction number and the number of instructions in the record information as the first execution length. If it is determined that the execution length in the record information is not 0, and the difference between the first execution length and the execution length in the record information is less than the difference threshold, then determine the number of times the first subroutine is repeatedly executed. greater than 2. In a possible implementation, after determining that the number of times the first subroutine is repeatedly executed is greater than the third preset value, the detection module 803 is also configured to: update the record information according to the first instruction number and the first execution length.
  • the detection module 803 is also configured to update the Number of instructions in the log message.
  • the instruction number threshold is determined based on the first instruction number and the preset execution length in the updated record information.
  • the parameter determination module 801 is also used to determine that the configuration parameters of the computing resources are default configuration parameters.
  • the detection module 803 determines whether the number of times the first subroutine is repeatedly executed is greater than the third preset value according to the record information, it is specifically used to: jump to the instruction according to the condition in the record information.
  • the number of executions determines whether the number of executions of the conditional jump instruction is greater than the third preset value. Wherein, when the number of execution times of the conditional jump instruction is greater than the third preset value, it is determined that the number of times the first subroutine is repeatedly executed is greater than the third preset value.
  • the detection module 803 determines whether the number of times the first subroutine is repeatedly executed is greater than the third preset value according to the record information, it is specifically used to: update the execution of the conditional jump instruction in the record information. times; based on the number of executions of the conditional jump instruction after the update, determine whether the number of executions of the conditional jump instruction is greater than the third preset value.
  • the parameter determination module 801 when determining the operating characteristics of a cycle in which the computing resource runs the first subroutine, is specifically used to obtain the operation of the computing resource running program when the conditional jump instruction is executed. Characteristics; and, obtain the operating characteristics of the program when the conditional jump instruction is executed once before executing the conditional jump instruction; and determine the operating characteristics in a cycle in which the computing resource runs the first subroutine based on the operating characteristics obtained twice. .
  • the parameter determination module 801 when determining the configuration parameters of the computing resource according to the operating characteristics, is specifically configured to determine the target preset characteristics that match the operating characteristics from multiple preset characteristics; The preset configuration parameters corresponding to the preset characteristics are determined as the configuration parameters of the computing resources.
  • the configuration parameters of the computing resource include a prefetching strategy, which includes a prefetching strategy for missing cache lines, a prefetching strategy for integer data access, the aggressiveness of the prefetching algorithm, etc.
  • the parameter determination module 801 when determining the target preset feature that matches the operating feature from multiple preset features, is specifically configured to perform dimensionality reduction processing on the A-dimensional operating feature, to obtain B-dimensional operating characteristics; according to the degree of matching between the B-dimensional operating characteristics and multiple B-dimensional preset features, select the preset feature with the highest matching degree from multiple B-dimensional preset features as the target Default feature, where A and B are both positive integers, and B is smaller than A.
  • the parameter determination module 801 determines the matching degree between the operating characteristics of the B dimension and the preset characteristics of any B dimension, it is specifically used to determine, for any dimension in the B dimension, The matching degree between the operating characteristics corresponding to the dimension and the preset characteristics; according to the corresponding matching degree of each dimension in the B dimension, the matching degree between the operating characteristics of the B dimension and the preset characteristics of the B dimension is determined.
  • the dimension in one dimension of the preset feature of dimension B, the dimension includes multiple bits, and the values of some of the bits in the multiple bits are masked.
  • the parameter determination module 801 determines the matching degree between the operating characteristics corresponding to the dimension and the preset characteristics for the dimension, the determination may be based on fuzzy matching.
  • the parameter determination module 801 if the parameter determination module 801 does not determine a target preset feature that matches the operating feature from a plurality of preset features, the default configuration parameter is used as the configuration parameter of the computing resource.
  • the configuration parameters of the computing resource include the address and configuration register value of the configuration register in the computing resource; when configuring the configuration parameters of the computing resource to the computing resource, the configuration module 802 is specifically used to: Write the configuration register value to the configuration register corresponding to the address of the configuration register.
  • the operating characteristics include at least any one or more of the following: the number of instructions executed by the processor core in each clock cycle, the instruction conversion lookaside buffer loss rate, cache miss rate, prefetching Hit rate.
  • each functional module in each embodiment of the present application may be integrated into one processing unit. In the device, it can exist physically alone, or two or more modules can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules.
  • the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a terminal device (which can be a personal computer, a mobile phone, or a network device, etc.) or a processor to execute all or part of the steps of the method in various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
  • the computing resource can alternately run the first program and the second program, for example, run the first program in the first time period and run the second program in the second time period.
  • the first time period and the second time period do not overlap. It can be understood that the calculation Resources can be used to run one of the programs at a time.
  • the computing resources occupied by the first program are very small, that is, the first period is much shorter than the second period, so when the computing resources alternately run the first program and the second program, the impact on the operating effect of the second program is negligible. .
  • the computing resources can be used to alternately run the first program and the second program, so that the computing resources can determine the operating characteristics of the subprograms that have been run or are currently running among the multiple similar subprograms included in the second program by running the first program. , and then determine the configuration parameters of the computing resources according to the determined operating characteristics, so that the computing resources can run the second program more efficiently.
  • the computer program product includes computer program instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions in the relevant method embodiments according to Figures 3 to 6 of the embodiments of the present application are generated.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., computer instructions may be transmitted from a website, computer, server or data center via a wired link (e.g. Coaxial cable, optical fiber, digital subscriber line or wireless (such as infrared, wireless, microwave, etc.) means to transmit to another website, computer, server or data center.
  • the computer-readable storage medium can be any available storage medium that can be accessed by the computer.
  • the media may be a data storage device such as a server or data center that contains one or more available media.
  • the available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVD), or semiconductor media (e.g., DVD). Such as SSD).
  • the detection module 803 is used to perform step 301 in Figure 3 and the steps in the related method embodiments of Figures 4 to 6; the parameter determination module 801 is used to perform step 302 and step 303 in Figure 3; the configuration module 802 is used to Execute step 304 in Figure 3.
  • the detection module 803 can, after each execution of the conditional jump instruction, if it is determined that the program is in If the program is in a loop, a detection signal is sent to the parameter determination module 801; if it is determined that the program is not in a loop, a start signal is sent to the parameter determination module 801.
  • the specific implementation method may be any one of the following examples A to C.
  • the detection module 803 sends a start signal and a detection signal to the parameter determination module 801.
  • the parameter determination module 801 responds to the start signal to reset the feature counter in the performance monitoring module (that is, clears the count); the parameter determination module 801 responds to the detection signal to determine the value of the feature counter in the performance monitoring module.
  • the detection module 803 sends a start signal and multiple detection signals to the parameter determination module 801.
  • the parameter determination module 801 responds to the start signal and resets the feature counter in the performance monitoring module (ie, clears the count); the parameter determination module 801 responds to the first detection signal and determines the value of the feature counter in the performance monitoring module. value, and the parameter determination module 801 responds to each detection signal after the first detection signal, records the value of the feature counter in the current performance monitoring module, and determines the value of the feature counter in the current performance monitoring module to be the same as the last time. The difference between the recorded feature counter values.
  • the parameter determination module 801 can also determine the value of the characteristic counter in the performance monitoring module in response to the start signal (that is, the characteristic counter in the performance monitoring module is not reset), and then in the parameter When the determination module 801 receives the first detection signal, in response to the detection signal, it determines the difference between the value of the feature counter in the performance monitoring module and the value of the last feature counter.
  • the detection module 803 sends multiple detection signals to the parameter determination module 801 (ie, no start signal is sent).
  • the parameter determination module 801 determines the value of the feature counter in the performance monitoring module in response to the first detection signal, and the parameter determination module 801 records the current performance monitoring in response to each detection signal after the first detection signal.
  • the value of the characteristic counter in the module is determined, and the difference between the value of the characteristic counter in the current performance monitoring module and the value of the characteristic counter recorded last time is determined.
  • Example 1 a certain characteristic counter in the performance monitoring module is used to determine the cache miss rate.
  • the parameter determination module 801 receives the start signal, the characteristic counter is reset. After receiving the detection signal, the parameter determination module 801 determines the value of the feature counter, thereby determining the cache miss rate based on the count of the feature counter between the start signal and the detection signal.
  • Example 2 A certain feature counter in the performance monitoring module is used to determine the prefetch hit rate. After the parameter determination module 801 receives the start signal, it resets the feature counter. When the parameter determination module 801 receives the detection signal, it determines the The value of the characteristic counter is used to determine the prefetch hit rate based on the count of the characteristic counter between the start signal and the detection signal.
  • the start signal and the detection signal only represent an action triggering mechanism.
  • the start signal is a pulse on the signal line between the detection module 803 and the parameter determination module 801
  • the detection signal is another pulse on the signal line between the detection module 803 and the parameter determination module 801 .
  • the start signal and the detection signal change in level.
  • the start signal changes in level from 0 to 1, and the detection signal changes in level from 1 to 0.
  • the detection module 803 when the conditional jump instruction is detected for the first time, the detection module 803 sends a start signal to the parameter determination module 801, and when the conditional jump instruction is detected for the second time, it sends a start signal to the parameter determination module 801.
  • the determination module 801 sends a detection signal.
  • the parameter determination module 801 determines the difference between the first and second conditional jump instructions. Count value, which can be used to indicate the running characteristics of the program in a cycle.
  • the detection module 803 detects the conditional jump instruction for the mth time (here m is greater than 2), it also sends a detection signal to the parameter determination module 801.
  • the parameter determination module 801 determines the m-th - The count value between the 1st and mth conditional jump instructions determines the running characteristics of the program in a loop between the m-1th and mth conditional jump instructions.
  • the detection module 803 sends a start signal to the parameter determination module 801 when the conditional jump instruction is detected for the second time, and when the conditional jump instruction is detected for the third time, it sends a start signal to the parameter determination module 801
  • the detection module 803 can not only determine the count value between two conditional jump instructions, but also improve the accuracy of loop detection.
  • the detection module 803 when the detection module 803 detects the conditional jump instruction for the mth time (here m is greater than 3), it also sends a detection signal to the parameter determination module 801.
  • the parameter determination Module 801 determines the count value between the m-1th and mth conditional jump instructions, and then determines the running characteristics of the program in a loop between the m-1th and mth conditional jump instructions.
  • Method 1 the detection module 803 sends a start signal to the parameter determination module 801 after detecting multiple conditional jump instructions; Method 1 2. After sending the start signal to the parameter determination module 801, the detection module 803 can also send a detection signal to the parameter determination module 801 once after detecting multiple conditional jump instructions, wherein between each two detection signals, The number of executions of conditional jump instructions can be the same or different.
  • Example A and Example C are similar to the above-mentioned Example B.
  • the detection module 803 sends a start signal to the parameter determination module 801 when the conditional jump instruction is detected for the first time, and when the conditional jump instruction is detected for the third time, it sends a start signal to the parameter determination module 801 801 sends a detection signal.
  • the parameter determination module 801 determines the count value between the first and third conditional jump instructions. This count value can be used to indicate the running characteristics of the program in the two cycles.
  • the detection module 803 sends a start signal to the parameter determination module 801 when the conditional jump instruction is detected for the second time, and when the conditional jump instruction is detected for the fifth time, it sends a start signal to the parameter determination module 801 801 sends a detection signal.
  • the parameter determination module 801 determines the count value between the second and fifth conditional jump instructions. This count value can be used to indicate the running characteristics of the program in the two cycles.
  • the processing device includes computing resources and a memory connected to the computing resources.
  • the memory is used to store computer programs, and the computing resources are used to execute the computer programs stored in the memory.
  • the computing resources are caused to implement the method in the above method embodiment.
  • the processing device includes one or more processors, and the processor includes one or more processor cores, wherein the processor core can implement the above method embodiments when reading the computer program stored in the memory. method.
  • this application provides a computer-readable storage medium.
  • Computer programs or instructions are stored in the computer-readable storage medium.
  • the computing resources in the processing device run the computer programs or instructions to execute the above method embodiments. method.
  • the present application provides a processing chip, including at least one processor core and an interface; the interface is used to provide program instructions or data to at least one processor core; and the at least one processor core is used to execute program lines. instructions to implement the methods in the above method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

一种运行程序的方法及装置,用于解决现有处理设备配置计算资源为默认配置参数,计算资源采用该默认配置参数运行多种不同类型的程序,不能适用于各类程序的运行特征的问题。在本申请中,方法可由处理设备或处理设备中的计算资源执行,具体的,在确定计算资源中正在运行的程序中包括多段相似的子程序时,获取多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序。根据运行特征确定计算资源的配置参数,进而使用计算资源的配置参数来配置该计算资源。

Description

一种运行程序的方法及装置
相关申请的交叉引用
本申请要求在2022年06月25日提交中国专利局、申请号为202210731665.4、申请名称为“一种程序执行方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2022年09月13日提交中国专利局、申请号为202211118557.6、申请名称为“一种程序运行方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种运行程序的方法及装置。
背景技术
处理设备中包括计算资源,计算资源可用于运行多种不同类型的程序,计算资源比如是中央处理器(central processing unit,CPU),或者CPU中的核。为了实现处理设备中计算资源的通用性,处理设备可配置计算资源为默认配置参数。以计算资源是CPU为例,CPU厂商在设计制造阶段,可设置CPU处于不同的配置参数下,然后对该处于不同配置参数下的CPU进行测试,以选取出最优的配置参数作为CPU的默认配置参数,如此,保证CPU的默认配置参数可用于CPU顺利运行各种不同类型的程序。
但是,处理设备中的计算资源采用默认配置参数来运行多种不同类型的程序,不能适用于各类程序的运行特征,即无法最大化利用计算资源的计算能力。
发明内容
本申请提供一种运行程序的方法及装置,用于识别计算资源中运行的程序中子程序的运行特征,根据识别出的运行特征,调整计算资源的配置参数,有助于提高计算资源运行程序的执行效率,最大化利用计算资源的计算能力。
第一方面,本申请提供一种运行程序的方法,该方法可以由处理设备执行,或由处理设备中的计算资源执行,其中,计算资源具体可以是处理设备中的处理器或者处理设备中的处理器的核。方法包括:在确定处理设备中的计算资源正在运行的程序中包括多段相似的子程序时,获取多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序。根据运行特征确定计算资源的配置参数,进而使用该确定出的配置参数来配置计算资源。
上述技术方案中,在确定计算资源中正在运行的程序中包括多段相似的子程序时,可根据已运行或者当前运行的子程序的运行特征,确定计算资源的配置参数,将计算资源的配置参数配置于计算资源中,有助于使得计算资源在执行该程序中其他相似的子程序时,提高执行效率,进而提高整个程序的执行效率。
在一种可能的实现方式中,多段相似的子程序为程序中的被多次执行的循环程序,每一段子程序对应于循环程序的一次或多次循环。上述技术方案中,识别计算资源运行的程序中存在循环程序,基于循环程序在一次或多次循环中的运行特征,确定运行该循环程序所采用的计算资源的配置参数,有助于计算资源更高效的执行该循环程序。
在一种可能的实现方式中,循环程序的指令数大于第二预设值。如此,避免较为频繁 地配置计算资源中的参数,减少计算功耗(代价,overhead)。
在一种可能的实现方式中,在确定计算资源中正在运行的程序中包括多段相似的子程序时,具体可以是,当程序中的第一子程序被重复执行的次数大于第三预设值时,确定第一子程序为循环程序。如此,提供一种确定程序中包括循环程序的方式,提高识别出循环程序的准确率。
在一种可能的实现方式中,在执行完成第一子程序并执行条件跳转指令之后,确定是否存在条件跳转指令的记录信息;若确定存在条件跳转指令的记录信息,则根据条件跳转指令的记录信息,确定第一子程序被重复执行的次数是否大于第三预设值;若确定尚未存在条件跳转指令的记录信息,则增加该条件跳转指令的记录信息。上述技术方案中,在执行完成第一子程序之后执行条件跳转指令,根据条件跳转指令的记录信息确定第一子程序是否为循环程序,有助于提高识别循环程序的准确率。
在一种可能的实现方式中,在执行条件跳转指令之后,先确定条件跳转指令是否指向小循环,其中,小循环指的是单次循环执行的指令数小于第二预设值的循环。若该条件跳转指令指向小循环,则将该条件跳转指令过滤掉;若该条件跳转指令未指向小循环,则进一步确定是否存在条件跳转指令的记录信息。
在一种可能的实现方式中,记录信息记录在计算资源的预设的一段缓存(buffer)中。如此,能够在缓存中迅速读取记录信息,提高识别循环程序的速度。
在一种可能的实现方式中,确定是否存在条件跳转指令的记录信息,具体可以是,根据该条件跳转指令的标识信息确定跳转标识,其中,跳转标识是条件跳转指令的标识信息,或者是条件跳转指令的标识信息的哈希。遍历缓存中的多条记录信息,若确定某条记录信息中包括该跳转标识,则确定缓存中包括该条件跳转指令的记录信息;若确定缓存中的多条记录信息中均未包括有该跳转标识,则确定缓存中不包括该条件跳转指令的记录信息。
在一种可能的实现方式中,条件跳转指令的标识信息是从分支记录模块中获取的。标识信息中包括起始位置和/或目标位置,或者,包括起始位置和/或目标位置的哈希。
在一种可能的实现方式中,在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体可以是,根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数是否大于第三预设值,其中,记录信息中指令数用于指示程序在上一次执行条件跳转指令时,程序累计执行的指令数;记录信息中执行长度是程序分别在前两次执行条件跳转指令时,程序累计执行的指令数的差值。
上述技术方案中,根据条件跳转指令的记录信息中的指令数和执行长度,确定第一子程序是否为循环程序,有助于提高识别循环程序的准确率。
在一种可能的实现方式中,第三预设值等于2,在根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数大于第三预设值时,具体可以是,将执行该条件跳转指令时程序执行的指令数作为第一指令数,以及将第一指令数和记录信息中指令数的差值作为第一执行长度。若记录信息中的执行长度不为0,且第一执行长度与记录信息中的执行长度之间的差值小于差值阈值,则确定第一子程序被重复执行的次数大于2。上述技术方案有助于提高识别循环程序的准确率。
在一种可能的实现方式中,在确定第一子程序被重复执行的次数大于第三预设值之后,还包括:根据第一指令数和第一执行长度更新记录信息。
在一种可能的实现方式中,在确定第一子程序为循环程序之后,还根据更新之后的记 录信息中的第一指令数和预设执行长度,确定指令数阈值,示例性的,将第一指令数和预设执行长度的加和作为指令数阈值,其中预设执行长度是预先设置的执行长度,或者是第一执行长度,或者是更新之前的记录信息中的执行长度等。当程序执行指令的指令数达到指令数阈值时,若尚未再次执行该条件跳转指令,则确定循环程序已退出;确定计算资源的配置参数为默认配置参数。如此,能够准确识别出程序是否退出循环,从而及时地调整计算资源的配置参数为默认配置参数。
在一种可能的实现方式中,在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体还可以是,根据记录信息中的条件跳转指令的执行次数,确定条件跳转指令的执行次数是否大于第三预设值。在一种可能的实现方式中,在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体可以是,先更新记录信息中条件跳转指令的执行次数;根据更新之后的条件跳转指令的执行次数,确定第一子程序被重复执行的次数是否大于第三预设值。具体的,当更新之后的条件跳转指令的执行次数大于第三预设值时,即确定第一子程序被重复执行的次数是否大于第三预设值。
在一种可能的实现方式中,在确定计算资源运行第一子程序的一次循环中的运行特征时,具体可以是,获取程序在执行该条件跳转指令时计算资源运行程序的运行特征;以及,获取程序在执行该条件跳转指令之前的一次执行该条件跳转指令时计算资源运行程序的运行特征;根据该两次获取的运行特征,确定计算资源运行第一子程序的一次循环中的运行特征。其中,运行特征是根据从性能监测单元的特征计数器中获取的特征计数值确定的。
在一种可能的实现方式中,根据运行特征确定计算资源的配置参数时,具体可以是,根据运行特征,从多个预设特征中确定与运行特征相匹配的目标预设特征;将目标预设特征对应的预设配置参数,确定为计算资源的配置参数。示例性的,计算资源的配置参数包括预取策略,预取策略包括丢失缓存行的预取策略、整数数据访问的预取策略、预取算法的激进程度等。上述技术方案中,当确定出已运行或正在运行的子程序的运行特征之后,获取该子程序的运行特征对应的配置参数,从而获取到适用于将来需要运行的子程序的运行特征的计算资源的配置参数,有助于提高程序的执行效率。
在一种可能的实现方式中,从多个预设特征中确定与运行特征相匹配的目标预设特征时,具体可以是,对A维的运行特征执行降维处理,得到B维的运行特征;根据B维的运行特征分别与多个B维的预设特征之间的匹配程度,从多个B维的预设特征中选择出匹配程度最高的预设特征,作为目标预设特征,其中,A、B均为正整数,且B小于A。
在一种可能的实现方式中,在确定B维的运行特征与任一个B维的预设特征之间的匹配程度时,具体可以是,针对B维中的任一个维度,确定该维度对应的运行特征和预设特征之间的匹配程度;根据B维中每个维度对应匹配程度,确定B维的运行特征和该B维的预设特征之间的匹配程度。在一种可能的实现方式中,在B维的预设特征中一个维度中,该维度包括多个比特位,多个比特位中部分比特位的取值被掩码掩盖。在针对该维度确定该维度对应的运行特征和预设特征之间的匹配程度时,可以基于模糊匹配的方式确定。
在一种可能的实现方式中,若未从多个预设特征中确定与运行特征相匹配的目标预设特征,则将默认配置参数作为计算资源的配置参数。
在一种可能的实现方式中,计算资源的配置参数中包括计算资源中配置寄存器的地址和配置寄存器值;在将计算资源的配置参数配置到计算资源时,具体可以是,将配置寄存器值写入至配置寄存器的地址对应的配置寄存器中。如此,有助于提高参数配置的灵活性。
在一种可能的实现方式中,运行特征至少包括如下任一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区的丢失率、缓存丢失率、预取命中率。
第二方面,本申请提供一种运行程序的装置,该运行程序的装置可以是处理设备,也可以是处理设备中的计算资源,其中,计算资源具体可以是处理设备中的处理器或者处理设备中的处理器的核。
该运行程序的装置中包括参数确定模块和配置模块。
参数确定模块,用于在确定处理设备中的计算资源正在运行的程序中包括多段相似的子程序时,获取多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序。以及,根据运行特征确定计算资源的配置参数。
配置模块,用于使用该确定出的配置参数来配置计算资源。
在一种可能的实现方式中,多段相似的子程序为程序中的被多次执行的循环程序,每一段子程序对应于循环程序的一次或多次循环。
在一种可能的实现方式中,循环程序的指令数大于第二预设值。
在一种可能的实现方式中,装置中还包括检测模块,检测模块用于确定计算资源中正在运行的程序中包括多段相似的子程序。具体的,检测模块在确定程序中的第一子程序被重复执行的次数大于第三预设值时,确定第一子程序为循环程序。
在一种可能的实现方式中,检测模块在确定程序中的第一子程序被重复执行的次数是否大于第三预设值时,具体用于:在执行完成第一子程序并执行条件跳转指令之后,确定是否存在条件跳转指令的记录信息;若确定存在条件跳转指令的记录信息,则根据条件跳转指令的记录信息,确定第一子程序被重复执行的次数是否大于第三预设值;若确定尚未存在条件跳转指令的记录信息,则增加该条件跳转指令的记录信息。
在一种可能的实现方式中,在检测模块获取到条件跳转指令之后,还用于:确定条件跳转指令是否指向小循环,其中,小循环指的是单次循环执行的指令数小于第二预设值的循环。若该条件跳转指令指向小循环,则将该条件跳转指令过滤掉;若该条件跳转指令未指向小循环,则进一步确定是否存在条件跳转指令的记录信息。
在一种可能的实现方式中,记录信息记录在计算资源的预设的一段缓存中。
在一种可能的实现方式中,检测模块在确定是否存在条件跳转指令的记录信息时,具体用于:根据该条件跳转指令的标识信息确定跳转标识,其中,跳转标识是条件跳转指令的标识信息,或是条件跳转指令的标识信息的哈希。遍历缓存中的多条记录信息,若确定某条记录信息中包括该跳转标识,则确定缓存中包括该条件跳转指令的记录信息;若确定缓存中的多条记录信息中均未包括有该跳转标识,则确定缓存中不包括该条件跳转指令的记录信息。在一种可能的实现方式中,条件跳转指令的标识信息是从分支记录模块中获取的。标识信息中包括起始位置和/或目标位置,或者,包括起始位置和/或目标位置的哈希。
在一种可能的实现方式中,检测模块在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体用于:根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数是否大于第三预设值,其中,记录信息中指令数用于指示程序在上一次执行条件跳转指令时,程序累计执行的指令数;记录信息中执行长度是程序分别在前两次执行条件跳转指令时,程序累计执行的指令数的差值。
在一种可能的实现方式中,第三预设值等于2,检测模块在根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数大于第三预设值时,具体用于:将执行该条件跳转指令时程序执行的指令数作为第一指令数,以及将第一指令数和记录信息中指令数的差值作为第一执行长度。若确定记录信息中的执行长度不为0,且第一执行长度与记录信息中的执行长度之间的差值小于差值阈值,则确定第一子程序被重复执行的次数大于2。在一种可能的实现方式中,检测模块在确定第一子程序被重复执行的次数大于第三预设值之后,还用于:根据第一指令数和第一执行长度更新记录信息。
在一种可能的实现方式中,检测模块在确定第一子程序为循环程序之后,还用于:根据更新之后的记录信息中的第一指令数和预设执行长度,确定指令数阈值。当程序执行指令的指令数达到指令数阈值时,若尚未再次执行条件跳转指令,则确定循环程序已退出;参数确定模块还用于确定计算资源的配置参数为默认配置参数。
在一种可能的实现方式中,检测模块在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体用于:根据记录信息中的条件跳转指令的执行次数,确定条件跳转指令的执行次数是否大于第三预设值。在确定条件跳转指令的执行次数大于第三预设值时,即确定第一子程序被重复执行的次数大于第三预设值。
在一种可能的实现方式中,检测模块在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体用于:更新记录信息中条件跳转指令的执行次数;根据更新之后的条件跳转指令的执行次数,确定条件跳转指令的执行次数是否大于第三预设值。
在一种可能的实现方式中,参数确定模块在确定计算资源运行第一子程序的一次循环中的运行特征时,具体用于,获取程序在执行该条件跳转指令时计算资源运行程序的运行特征;以及,获取程序在执行该条件跳转指令之前的一次执行条件跳转指令时计算资源运行程序的运行特征;根据该两次获取的运行特征,确定计算资源运行第一子程序的一次循环中的运行特征。其中,运行特征是参数确定模块根据从性能监测单元的特征计数器中获取的特征计数值确定的。
在一种可能的实现方式中,参数确定模块在根据运行特征确定计算资源的配置参数时,具体用于,从多个预设特征中确定与运行特征相匹配的目标预设特征;将目标预设特征对应的预设配置参数,确定为计算资源的配置参数。示例性的,计算资源的配置参数包括预取策略,预取策略包括丢失缓存行的预取策略、整数数据访问的预取策略、预取算法的激进程度等。
在一种可能的实现方式中,参数确定模块在从多个预设特征中确定与运行特征相匹配的目标预设特征时,具体用于,对A维的运行特征执行降维处理,得到B维的运行特征;根据B维的运行特征分别与多个B维的预设特征之间的匹配程度,从多个B维的预设特征中选择出匹配程度最高的预设特征,作为目标预设特征,其中,A、B均为正整数,且B小于A。
在一种可能的实现方式中,在参数确定模块确定B维的运行特征与任一个B维的预设特征之间的匹配程度时,具体用于,针对B维中的任一个维度,确定该维度对应的运行特征和预设特征之间的匹配程度;根据B维中每个维度对应匹配程度,确定B维的运行特征和该B维的预设特征之间的匹配程度。在一种可能的实现方式中,在B维的预设特征中一个维度中,该维度包括多个比特位,多个比特位中部分比特位的取值被掩码掩盖。在针对该维度确定该维度对应的运行特征和预设特征之间的匹配程度时,参数确定模块可以基于 模糊匹配的方式确定。
在一种可能的实现方式中,若参数确定模块未从多个预设特征中确定与运行特征相匹配的目标预设特征,则将默认配置参数作为计算资源的配置参数。
在一种可能的实现方式中,计算资源的配置参数中包括计算资源中配置寄存器的地址和配置寄存器值;配置模块在将计算资源的配置参数配置到计算资源时,具体用于,将配置寄存器值写入至配置寄存器的地址对应的配置寄存器中。
在一种可能的实现方式中,运行特征至少包括如下任一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区的丢失率、缓存丢失率、预取命中率。
第三方面,本申请提供一种处理设备,包括计算资源及与计算资源连接的存储器,存储器用于存储计算机程序,计算资源用于执行存储器中存储的计算机程序,以使得计算资源实现上述第一方面或第一方面的任一种可能的实现方式中的方法。
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序或指令,当计算机程序或指令被处理设备中计算资源执行时,实现上述第一方面或第一方面的任一种可能的实现方式中的方法。
第五方面,本申请提供一种处理芯片,包括至少一个处理器核和接口;接口,用于为至少一个处理器核提供程序指令或者数据;至少一个处理器核用于执行程序行指令,以实现上述第一方面或第一方面的任一种可能的实现方式中计算资源所执行的方法。
上述第二方面至第五方面中任一方面可以达到的技术效果可以参照上述第一方面中有益效果的描述,此处不再重复赘述。
附图说明
图1为一种处理设备的结构示意图;
图2为一种CPU核的内部结构示意图;
图3为本申请提供的一种运行程序的方法的流程示意图;
图4为本申请提供的一种确定循环程序的流程示意图;
图5为本申请提供的一种确定循环程序的具体实现方式的流程示意图;
图6为本申请提供的又一种确定循环程序的具体实现方式的流程示意图;
图7为本申请提供的一种预设特征与预设配置参数的对应关系的示意图;
图8为本申请提供的一种运行程序的装置的结构示意图。
具体实施方式
下面将结合附图,对本申请实施例进行详细描述。
图1提供一种可能的处理设备10的结构示意图。
处理设备10包括处理器101、存储器102和通信接口103。其中,处理器101、存储器102和通信接口103任两个之间可通过总线104连接。
处理器101可以是CPU,该CPU可用于执行存储器102中的指令以实现一个或多个功能,例如,确定程序是否处于循环(loop)(或称为循环状态)中。除CPU之外,处理器101还可以是专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、片上系统(system on chip,SoC)或复杂 可编程逻辑器件(complex programmable logic device,CPLD)、图形处理器(graphics processing unit,GPU)、神经网络加速器(neural-network processing unit,NPU)等。
在实际应用中,处理器101的数量可以有多个,该多个处理器101可以包括多个相同类型的处理器,也可以包括多个不同类型的处理器,例如,多个处理器101即为多个CPU。又例如,该多个处理器101中包括一个或多个CPU以及一个或多个GPU。再例如,该多个处理器101中包括一个或多个CPU以及一个或多个NPU。或者,该多个处理器101中包括一个或多个CPU、一个或多个GPU、以及一个或多个NPU等。
其中,处理器101(比如CPU、NPU等)可包括有一个物理核(physical core/processor),或者包括多个物理核。其中,物理核是处理器内部可看到的、真实的处理器核。如下为方便描述,可将处理器的物理核简称为是处理器核。
以CPU中的一个物理核(简称为CPU核)为例。如图2为本申请示例性提供的一种CPU核20的内部结构示意图,CPU核20中包括微指令(micro-ops/uops)模块201、分支记录模块202、性能监控模块203和寄存器204。
当然,CPU核20中还可以包含图2中未示出的其他模块。
其中,微指令模块201,用于检测和保存微指令循环序列,当微指令循环序列小于或等于微指令模块201的容量时候,可被存放在微指令模块201中,便可以不需要再通过前端的译码得到相应的微指令序列,只需要不断从微指令模块201中取出相应的微指令序列便可。微指令模块201比如是循环指令流检测器(loop stream detector,LSD)。
分支记录模块202,用于记录CPU核20最近一次或最近多次执行的分支跳转,比如CPU核20在执行跳转指令2时,由指令2跳转到指令11,那么分支记录模块202可记录该分支跳转的起始位置和目标位置,即起始位置为指令2、目标位置为指令11。分支记录模块202比如是上次分支记录(last brach recording,LBR)模块。
性能监控模块203,包括一个或多个计数器(counters),能够跟踪和计数一些底层硬件事件,如与CPU核20有关的事件(执行指令数、捕获异常数、时钟周期数等)、与缓存(cache)有关的事件(L1/L2 cache访问次数、丢失(miss)次数等)以及与转译后备缓冲器(translation lookaside buffer,TLB)有关的事件等。这些事件反映了程序执行期的行为,可用于对程序进行分析和调优。性能监控模块203比如是性能监控单元(performance monitoring unit,PMU)。
寄存器204,是有限存贮容量的高速存贮部件,可用来暂存指令、数据和地址。在本申请中,寄存器具体可以是用于定义CPU行为的寄存器,为方便理解,可将用于定义CPU行为的寄存器称为是配置寄存器。
存储器102,是指用于存储数据的装置,它可以是内存,也可以是硬盘。
内存,是指与处理器101直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为运行在处理器101上的操作系统或其他正在运行中的程序的临时数据存储器。内存包括易失性存储器(volatile memory),例如,随机存储器(random access memory,RAM)、动态随机存储器(dynamic random access memory,DRAM)等,也可以包括非易失性存储器(non-volatile memory),例如存储级内存(storage class memory,SCM)等,或者易失性存储器与非易失性存储器的组合等。在实际应用中,处理设备10中可配置多个内存,可选的,该多个内存可以是不同类型。本实施例不对内存的数量和类型进行限定。此外,可对 内存进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。
硬盘,用于提供存储资源,例如用于存储程序的数据,比如图片、视频、音频、文本等数据。硬盘包括但不限于:非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state disk,SSD)等。与内存不同之处在于,硬盘的读写速度较慢,通常用于持久性地存储数据。在一种实施方式中,硬盘中的数据、程序指令等需要先加载到内存中,然后,处理器再从内存中获取这些数据和/或程序指令。
通信接口103,用于与其他设备通信。
通常情况下,处理设备配置处理设备中的计算资源为默认配置参数,其中,计算资源比如是处理器或处理器中的核,配置参数指的是计算资源在运行程序时所采用的参数。举例来说,配置参数包括预取策略,预取策略包括丢失缓存行(missing cache line)的预取策略、整数数据访问(integer data access)的预取策略、预取算法的激进程度(比如被动(passive)策略或主动(aggressive)策略)等。
各类程序具备各自的运行特征(或称为行为特征)。示例性的,程序的运行特征包含程序的计算特征、程序的访存特征等。示例性的,程序的运行特征可以由多种微架构特征来表示,比如,多种微架构特征包括如下中一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区(instruction translation lookaside buffer,iTLB)的丢失率、缓存(cache)丢失率、预取命中率、数据转换后备缓冲区(data translation lookaside buffer,dTLB)的丢失率等。需要指出的是,iTLB还可称为是指令列表缓存、指令转址旁路缓存、地址翻译缓存等。dTLB还可称为是数据列表缓存、数据转址旁路缓存、数据翻译缓存等。
还需要指出的是,上述运行特征仅是示例性说明,在其他情况中,运行特征除了包括处理器核每一时钟周期内所执行的指令数、iTLB的丢失率、缓存丢失率、预取命中率、dTLB的丢失率中一项或多项之外,还可以包括其他的运行特征;或者,运行特征不包括处理器核每一时钟周期内所执行的指令数、iTLB的丢失率、缓存丢失率、预取命中率、dTLB的丢失率,而包括其他的运行特征。本申请并不对具体的运行特征进行限定。
由于各类程序具备各自的运行特征,当计算资源采用默认的配置参数来运行各类程序时,虽然能保障各类程序的顺利运行,但无法基于各类程序的运行特征,通过合理的配置参数来高效运行程序。
为此,本申请提供一种运行程序的方法,该运行程序的方法由处理设备或处理设备中的计算资源执行。以计算资源执行为例说明,计算资源中运行有程序,计算资源获取该程序中子程序的运行特征,根据获取到的运行特征调整计算资源的配置参数,以使得计算资源可高效的运行该程序中与上述子程序具备相类似运行特征的子程序。
图3为本申请示例性提供的一种运行程序的方法的流程示意图,参照图3解释如下:
步骤301,计算资源确定计算资源中正在运行的程序包括多段相似的子程序。其中,多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序。
如下示例性提供程序中包括多段相似的子程序的表现形式:
表现形式1,多段相似的子程序是多段相似的程序片段,即该多段程序片段具有相似的运行特征,具体的,该多段相似的程序片段的运行特征的相似度大于或等于第一预设值。
示例性的,处理设备或计算资源预先对多段程序片段进行静态分析,得到多段程序片段分别对应的静态分析结果,各自的静态分析结果中可包括各段程序片段的运行特征。随后,处理设备或计算资源根据多段程序片段中各程序片段的运行特征,确定该多段程序片段具有相似的运行特征。相应的,计算资源在运行程序的过程中,可根据上述的预先分析结果,直接确定出该程序中包括该多段相似的程序片段。
再示例性的,处理设备或计算资源预先运行该多段程序片段,并获取每个程序片段对应的运行特征,进而根据该多段程序片段的运行特征,确定该多段程序片段具有相似的运行特征。相应的,计算资源在再次运行程序的过程中,可根据上述的预先分析结果,直接确定出该程序中包括该多段相似的程序片段。
表现形式2,多段相似的子程序为程序中的被多次执行的循环程序,其中,每一段子程序对应于循环程序的一次或多次循环,也即,该循环程序在多次循环中的运行特征的相似度大于或等于第一预设值。示例性的,正在运行的程序对应于指令流,该指令流中包括该循环程序在多次循环中分别对应的指令流片段(即子程序),该多个指令流片段的运行特征的相似度大于或等于第一预设值。
相应的,计算资源在执行程序的过程中,若确定程序中包括被循环执行的循环程序,即确定计算资源正在运行的程序中包括多段相似的子程序。其中,计算资源确定程序中是否存在需要循环执行的循环程序(记为第一子程序)的实现方式,具体可参见下述图4至图6相关实施例中描述。
步骤302,计算资源获取多段相似的子程序中已运行或者当前运行的子程序的运行特征。
由于多段相似的子程序的运行特征的相似度大于或等于第一预设值,那么计算资源可从多段相似的子程序中获取已运行或者当前运行的子程序的运行特征,将该确定出的运行特征作为多段相似的子程序的运行特征。
示例性的,在多段相似的子程序是多段相似的程序片段的情况下,计算资源获取多段相似的程序片段中已运行或者当前运行的程序片段的运行特征,将该已运行或者当前运行的程序片段的运行特征作为多段相似的程序片段中尚未运行的程序片段的运行特征。比如多个相似的程序片段是程序片段1至程序片段10,计算资源运行完成程序片段1和程序片段2且尚未运行程序片段3至程序片段10,计算资源确定程序片段1和程序片段2的运行特征,将该确定出的运行特征作为程序片段3至程序片段10的运行特征。
示例性的,在多段相似的子程序是程序中的被计算资源多次执行的循环程序的情况下,计算资源获取已运行或者当前运行的循环过程中的运行特征,将该确定出的运行特征作为尚未运行的循环过程中的运行特征。比如,计算资源执行子程序的循环次数是10次,计算资源在循环运行该子程序的过程中,确定已经运行的第1次循环或当前正在运行的第2次循环的过程中子程序的运行特征,将该确定出的运行特征作为第3次至第10次循环的过程中子程序的运行特征。
如下解释说明计算资源获取运行特征的实现方式。示例性的,计算资源中设置有多个特征计数器,针对每个特征计数器,计算资源确定该特征计数器在子程序执行前后分别对应的特征计数值,进而确定该子程序对应的特征计数值。计算资源将多个特征计数器确定 的该子程序对应的多个特征计数值,组成该子程序的运行特征。
示例性的,计算资源中包括3个特征计数器(记为特征计数器1至特征计数器3),其中,特征计数器1用于记录处理器核每一时钟周期内所执行的指令数,特征计数器2用于记录缓存丢失次数,特征计数器3用于记录指令转换后备缓冲区的丢失次数。
举例1,在计算资源开始执行子程序时,特征计数器1至特征计数器3的特征计数值均为0;在计算资源执行完成该子程序时,特征计数器1至特征计数器3的特征计数值分别为100、200、20,那么该子程序的运行特征中包括100、200、20。
举例2,在计算资源开始执行子程序时,特征计数器1至特征计数器3的特征计数值分别为100、200、20;在计算资源执行完成该子程序时,特征计数器1至特征计数器3的特征计数值分别为201、403、40,那么该子程序的运行特征中包括101、203、20。
步骤303,计算资源根据确定出的运行特征确定计算资源的配置参数。
此处,计算资源的配置参数,即需要配置于计算资源中的参数,可将该计算资源的配置参数称为是计算资源的目标配置参数。或者还可以理解,计算资源的配置参数,是最适于当前运行程序的运行特征的配置参数,当计算资源采用该配置参数时,程序的运行效果最优,如此,还可将该计算资源的配置参数,称为是计算资源的最优配置参数。
示例1,计算资源中预先设置多个预设特征,以及该多个预设特征分别对应的预设配置参数。比如,计算资源中预先设置预设特征1至1000,以及预设特征1至1000分别对应的预设配置参数1至1000。计算资源从多个预设特征中确定与运行特征相匹配的目标预设特征,将目标预设特征对应的预设配置参数,作为计算资源的目标配置参数。结合上述例子,当计算资源从预设特征1至1000中确定出与运行特征相匹配的目标预设特征是预设特征10,那么计算资源将预设配置参数10作为计算资源的目标配置参数。
示例2中,计算资源中预先设置多个预设特征,以及该多个预设特征分别对应的预设配置参数的地址。示例性的,计算资源通过三态内容寻址存储器(ternary content addressable memory,TCAM),来存储多个预设特征和该多个预设特征分别对应的预设配置参数的地址。比如,计算资源中预先设置预设特征1至1000,以及预设特征1至1000分别对应的预设配置参数的地址1至地址1000。计算资源从多个预设特征中确定与运行特征相匹配的目标预设特征,从目标预设特征对应的预设配置参数的地址中读取预设配置参数,将读取到的预设配置参数作为计算资源的目标配置参数。结合上述例子,当计算资源从预设特征1至1000中确定出与运行特征相匹配的目标预设特征是预设特征10,那么计算资源从地址10中读取预设配置参数,将该读取到的预设配置参数作为计算资源的目标配置参数。
可以理解的是,示例2中多个预设特征也对应于各自的预设配置参数。
进一步的,预设特征与预设配置参数的对应关系,可通过机器学习得到。示例性的,计算资源通过运行大量程序,对程序的运行特征进行采样,并针对每个运行特征下的程序进行性能的自动优化,以得到该运行特征下的计算资源的最优配置参数,并将该运行特征和最优配置参数作为预设特征与预设配置参数,预先设置于处理设备中。此外,处理设备还可对运行特征和运行特征对应的最优配置参数进行聚类,以减少处理设备通过存储空间来存储该运行特征和运行特征对应的最优配置参数时,存储空间的代价。
或者,处理设备中预设特征与预设配置参数的对应关系,还可以是基于专家经验设置。
如下以机器学习为例说明:对大量不同程序的运行特征和其对应的最优配置参数进行 聚类,其中,最优配置参数可通过人工智能(artificial intelligence,AI)寻优算法得到,AI寻优算法比如是贝叶斯优化。其中,一类程序的运行特征为每210指令L3预取计数范围为10100000~10110000,L3丢失(miss)计数范围为1100000~1101111,预设配置参数的地址是0x1000101,该地址指向配置寄存器的地址和值,其中,配置寄存器的地址为HHA预取,配置寄存器的值为0(表示功能关闭(off)),对应关系的具体存储形式如图7所示。HHA是L3与内存之间进行缓存一致性维护的组件,HHA预取功能会将数据预取进L3。
当然,图7中例子也可以通过专家经验设置,具体的,L3预取计数较高且L3丢失率较大,基于专家经验,可以认为无效预取较多,则关闭HHA预取会缓解访存带宽压力,有助于性能提升。
在一个可能的实际应用场景中,处理设备将预设特征与预设配置参数的地址,存储于三态内容寻址存储器中,以及将多个地址指向的多个预设配置参数存储于处理设备的内存、缓存(cache)、寄存器,或者其他高速缓存中。
可选的,计算资源从多个预设特征中确定与运行特征相匹配的目标预设特征时,具体可包括如下步骤1至步骤3:
步骤1,归一化处理:
比如预设特征中的指令数为1000,而运行特征中的指令数为2000,那么计算资源可将运行特征中的指令数归一化为1000。进一步的,假设运行特征包括指令数为2000、L2 cache miss的次数为400,那么归一化之后的运行特征包括:指令数为1000、L2 cache miss的次数为200。
再比如,多个预设特征中的指令数均为28,而运行特征中的指令数为210,那么计算资源可将运行特征中的指令数,通过将计数右移2位而归一化得到指令数28。进一步的,计算资源将运行特征中的其他参数也右移2位。
步骤2,降维处理:
计算资源中包括A个特征计数器,相应的,A个特征计数器分别对应的A个特征计数值组成A维的预设特征。为了降低存储成本,可对A维的预设特征进行降维处理,以得到B维的预设特征,其中,A、B均为正整数,且B小于A。
进一步的,计算资源在获取A维的运行特征之后,可将该A维的运行特征降维,得到B维的运行特征。
步骤3,特征匹配:
计算资源确定该B维的运行特征,分别与多个B维的预设特征之间的匹配程度,进而从多个B维的预设特征中选择出匹配程度最高的预设特征,作为目标预设特征。
在一种可能方式中,对于任一个B维的预设特征中的任一个维度,该维度的取值可通过二进制表示,即该维度中包括多个比特位,多个比特位中每个比特位对应于各自的取值,比如,维度中包括10111 01100(即10个比特位)。
为了进一步降低用于存储预设特征的存储空间的容量(或代价),以及提高特征匹配效率,计算资源可针对多个维度中的每个维度执行如下操作:计算资源通过掩码掩掉该维度中的部分比特位,结合上述例子,基于掩码11111 11000,掩掉10111 01100中的后3位,从而得到掩码之后的维度的取值,即10111 01***。
如此,计算资源确定运行特征中的B个维度与预设特征中的B个维度的匹配程度时,可以先针对该B个维度中的每个维度,确定运行特征与预设特征在该维度上的匹配程度, 然后再根据运行特征与预设特征在每个维度上的匹配程度,确定该运行特征中的B个维度和预设特征中的B个维度之间的匹配程度。
进一步的,计算资源在针对每个维度确定运行特征与预设特征在该维度上的匹配程度的实现中,计算资源可基于模糊匹配的方式确定。仍结合上述例子,预设特征中某个掩码之后的维度的取值是10111 01***,该维度在运行特征中的取值是10111 00111,那么可基于模糊匹配的方式,确定该“10111 01”与“10111 00”二者的匹配程度。
需要指出的是,降维处理是可选的,当计算资源未对预设特征和运行特征降维时,那么在特征匹配时,计算资源即确定该A维的运行特征,分别与多个A维的预设特征之间的匹配程度,进而从多个A维的预设特征中选择出匹配程度最高的预设特征,作为目标预设特征。此外,在该情况中,计算资源还可针对预设特征的A个维度中每个维度,通过掩码掩掉该维度中的部分比特位,具体可参见上述通过掩码掩掉预设特征的B个维度中每个维度中的部分比特位,此处不再赘述。
此外,计算资源若确定多个预设特征中不存在与运行特征相匹配的目标预设特征时,计算资源可将默认配置参数,作为计算资源的配置参数。在一种可能的实现,计算资源还设置通用预设特征,该通用预设特征中特征全部是“*”,即该通用预设特征中各个维度的取值全部为“*”,该通用预设特征对应于默认配置参数。示例性的,该通用预设特征和通用预设特征对应的默认配置参数可存储于TCAM中。当计算资源确定上述多个预设特征中不存在与运行特征相匹配的目标预设特征时,可确定该通用预设特征与目标预设特征相匹配,进而将通用预设特征对应的默认配置参数,作为计算资源的配置参数。
步骤304,计算资源使用计算资源的配置参数(即目标配置参数)进行配置。
可选的,计算资源根据目标配置参数,以及计算资源的当前配置参数,确定是否调整计算资源的配置参数。具体的,计算资源若确定当前配置参数与目标配置参数一致,则不做处理;计算资源若确定当前配置参数与目标配置参数不一致,则将计算资源的当前配置参数调整为目标配置参数。
一个示例中,目标配置参数包括配置寄存器对应的配置寄存器值,计算资源在将计算资源的当前配置参数调整为目标配置参数时,具体可以是,将配置寄存器值写入至该配置寄存器中。可选的,目标配置参数中包括多个配置寄存器分别对应的配置寄存器值。比如,目标配置参数中包括配置寄存器a至配置寄存器c分别对应的配置寄存器值(记为值a至值c),计算资源将值a写入至配置寄存器a中,将值b写入至配置寄存器b中,以及将值c写入至配置寄存器c中。
再一个示例中,目标配置参数包括配置寄存器地址和该配置寄存器地址对应的配置寄存器值,计算资源在将计算资源的当前配置参数调整为目标配置参数时,具体可以是,将配置寄存器值写入至该配置寄存器地址对应的配置寄存器中。可选的,目标配置参数中包括多个配置寄存器地址分别对应的配置寄存器值。比如目标配置参数中包括地址a和值a,地址b和值b,地址c和值c,计算资源将值a写入至地址a对应的配置寄存器中,将值b写入至地址b对应的配置寄存器中,将值c写入至地址c对应的配置寄存器中。相比于上一个示例,该方式有助于提高参数配置的灵活性。
上述技术方案中,计算资源在运行程序的过程中,确定程序包括的多个相似子程序中已运行或正在运行的子程序的运行特征,进而根据该运行特征,确定计算资源的目标配置 参数,将该目标配置参数配置于计算资源中,进而通过该计算资源运行程序,有助于实现运行程序的高效性。举例来说,计算资源在通过计算资源运行某个程序时,能够通过控制预取算法的激进程度来优化程序性能。具体的,计算资源获取已运行或正在运行的子程序中运行特征(比如内存带宽占用率和预取命中率),当识别到该子程序的内存带宽占用率很高,且预取命中率比较低时,计算资源可通过调整计算资源的配置参数(比如降低预取算法的激进程度,将原来的aggressive策略降低为passive策略),以用来提升程序性能。反之,计算资源可通过调整计算资源的配置参数(比如提高预取算法的激进程度,将原来的passive策略提升为aggressive策略),以用来提升程序性能。
如图4提供一种计算资源确定循环程序的流程示意图。
步骤401,计算资源在执行第一子程序之后,执行条件跳转指令。
其中,条件跳转指令理解为:由编程语言中的条件判断语句翻译得到的、计算资源可以理解的指令,其中,条件判断语句比如if、while或for等。
跳转具体是,计算资源在执行第一指令之后,即执行第二指令。第一指令和第二指令在一段程序代码中为两个不连续的指令,第一指令在第二指令之前,或第一指令在第二指令之后,比如,一段程序代码中,第一指令位于第1行,第二指令位于第10行。
计算资源执行条件跳转指令,具体可以是,计算资源从存储器中读取条件跳转指令,在运行程序的过程中执行该条件跳转指令。存储器比如是内存、高带宽存储(high bandwidth memory,HBM)和非易失性存储器。可选的,该存储器包含于处理设备中。
此外,计算资源在执行第一子程序之后,若未执行条件跳转指令,则计算资源确定第一子程序不是循环程序。
需要补充的是,计算资源在运行程序的过程中执行指令(比如条件跳转指令),相当于计算资源执行指令,或者,程序执行指令。
步骤402,计算资源确定条件跳转指令的标识信息。
示例性的,条件跳转指令的标识信息是条件跳转指令的地址信息,或者是,条件跳转指令的地址信息的哈希,或者是,条件跳转指令的地址信息的映射等。
其中,条件跳转指令的地址信息包括条件跳转指令的起始位置和/或目标位置,结合步骤401中例子,条件跳转指令的起始位置是第一指令,目标位置是第二指令。
可选的,计算资源从分支记录模块中获取条件跳转指令的标识信息。
可选的,计算资源还确定第一指令数。其中,第一指令数是计算资源在执行条件跳转指令时,该程序累计执行指令的总数(即指令数)。
进一步的,性能监控模块中包括指令计数器,计算资源具体是从性能监控模块的指令计数器中读取第一指令数。基于指令计数器是否发生过重置,如下分两个示例说明:示例a,在程序开始运行之后,若指令计数器未曾发生过重置,那么第一指令数是程序累计执行指令的总数。示例b,在程序开始运行之后,若指令计数器发生过重置,那么第一指令数是在上一次指令计数器重置之后,程序累计执行指令的总数。
可选的,计算资源还确定第一执行次数。其中,第一执行次数是计算资源执行该条件跳转指令的总次数。同样的,性能监控模块中还包括执行次数计数器,计算资源还能够从性能监控模块的执行次数计数器中读取第一执行次数,具体可参见计算资源从性能监控模 块的指令计数器中读取第一指令数的实现方式。
示例性的,分支记录模块和性能监控模块均位于计算资源内。再示例性的,分支记录模块位于计算资源内,性能监控模块中的部分计数器(比如指令计数器、执行次数计数器等)位于计算资源内,而性能监控模块中的其他计数器(比如用于监测总线通信次数的计数器)位于计算资源外。
步骤403,计算资源根据条件跳转指令的标识信息,确定第一子程序被重复执行的次数大于第三预设值,进而确定第一子程序是循环程序。
本申请中,计算资源确定其正在执行的子程序是循环程序,又可认为是,计算资源确定其正在执行的程序处于循环状态中。计算资源确定其正在执行的循环程序即将退出/已退出循环,又可认为是,计算资源确定其正在执行的程序即将退出/已退出循环状态。
计算资源中包括缓存(buffer),缓存包括多个条件跳转指令的记录信息。
条件跳转指令的记录信息中包括跳转标识,可选的,记录信息中还包括指令数、执行长度、执行次数中一项或多项。
如下先以缓存中任一个条件跳转指令的记录信息为例,解释说明记录信息中各字段以及记录信息的更新方式。
(1)跳转标识:基于条件跳转指令的标识信息确定。示例性的,跳转标识是条件跳转指令的标识信息,或者,跳转标识是条件跳转指令的标识信息的哈希。其中,条件跳转指令的标识信息可参见上述步骤402中的说明。
(2)指令数:在计算资源执行该条件跳转指令时程序累计执行指令的总数。示例性的,在程序开始运行之后,若指令计数器发生过重置,那么程序累计执行指令的总数是在上一次指令计数器重置之后,程序累计执行指令的总数。
(3)执行长度:计算资源根据两次执行该相同条件跳转指令之间的、程序累计执行指令的总数确定,执行长度也即循环程序的长度。
解释为,该记录信息对应的条件跳转指令比如是计算资源在第K次执行的条件跳转指令,相应的,记录信息中的执行长度具体可以是计算资源根据第K次执行的条件跳转指令和第K-k次执行的条件跳转指令之间、程序累计执行指令的总数,其中,K和k为正整数。特殊的,当K等于1时,记录信息中的执行长度取值可以为0。
在k=1的情况下:
记录信息中的执行长度具体是,计算资源根据第K次执行的条件跳转指令和第K-1次执行的条件跳转指令之间、程序累计执行指令的总数确定。比如,执行长度是该相邻两次执行该相同条件跳转指令之间的、程序累计执行指令的总数。
在k>1的情况下:
一个示例中,记录信息中的执行长度具体是,计算资源在第K次执行的条件跳转指令和第K-k次执行的条件跳转指令之间、程序累计执行指令的总数。举例来说,计算资源在第10次执行的条件跳转指令和第5次执行条件跳转指令之间、程序累计执行指令的总数为500条,那么记录信息中的执行长度具体是500。
再一个示例中,记录信息中的执行长度具体是,计算资源在第K次执行的条件跳转指令和第K-k次执行的条件跳转指令之间、所有相邻两次执行的条件跳转指令的平均数。举例来说,计算资源在第10次执行的条件跳转指令和第5次执行条件跳转指令之间、程序 累计执行指令的总数为500条,那么记录信息中的执行长度具体是100。
(4)执行次数:计算资源执行该条件跳转指令的次数。
如下,以该条件跳转指令的记录信息中包括跳转标识、指令数和执行长度为例,对计算资源向缓存中写入条件跳转指令的记录信息的方式解释:
当计算资源准备将条件跳转指令的记录信息写入至缓存时,预先确定当前缓存中是否已经存在该条件跳转指令的记录信息,若是,则将该待写入的记录信息更新至原来的记录信息中;否则,在缓存中写入该待写入的记录信息(即一条新的记录信息)。
举例来说,条件跳转指令指示程序由指令10跳转到指令7,相应的,条件跳转指令的标识信息中包括起始位置“指令10”和目标位置“指令7”,计算资源根据起始位置“指令10”和目标位置“指令7”,确定跳转标识为hash(10-7)。
计算资源在第1次执行该条件跳转指令时,根据跳转标识“hash(10-7)”,确定缓存中不包括该条件跳转指令的记录信息,且程序累计执行指令的总数是400条。进一步的,计算资源将“hash(10-7)、400和0”作为该条件跳转指令的记录信息写入至缓存中。
计算资源在第2次执行该条件跳转指令时,根据跳转标识“hash(10-7)”,确定缓存中包括该条件跳转指令的记录信息,且确定第2次执行该条件跳转指令时程序累计执行指令的总数是500条,以及在第1次执行该条件跳转指令至第2次执行该条件跳转指令之间程序累积执行指令的总数是100(即500-400)条。此处可以理解,处理设备在执行“指令7”至“指令10”的过程中,可能存在函数调用等,所以程序累积执行指令的总数是100(即大于4)。进一步的,计算资源将“hash(10-7)、500和100”更新至缓存中已有的记录信息“hash(10-7)、400和0”上。
以此类推,计算资源在第n次执行该条件跳转指令时(其中n大于2),根据跳转标识“hash(10-7)”,确定缓存中包括该条件跳转指令的记录信息。进一步的,计算资源确定第n次执行该条件跳转指令时程序累计执行指令的总数是X条,且在第(n-1)次执行该条件跳转指令至第n次执行该条件跳转指令之间,程序累积执行指令的总数是Y条。计算资源将“hash(10-7)、X和Y”更新至缓存中该已有的条件跳转指令对应的记录信息上。
需要补充的是,当记录信息中包括跳转标识,或,记录信息中包括跳转标识和指令数,或,记录信息中包括跳转标识和执行次数等情况下,计算资源向缓存中写入记录信息的方式与上述类似。
进一步的,计算资源在向缓存中写入条件跳转指令的记录信息时,可将非循环跳转指令(比如if else对应的条件跳转指令)的记录信息,以及两条相同条件跳转指令之间的其他条件跳转指令的记录信息过滤掉。
一个具体实现中,计算资源将条件跳转指令的记录信息依序写入至缓存中,在写入某个跳转指令的记录信息时,若确定该跳转指令的记录信息已经存在于缓存中,且该记录信息之后还记录有其他的跳转指令的记录信息,那么可将该记录信息之后还记录的该其他跳转指令的记录信息清空,并根据该条件跳转指令的新的指令数、执行长度、执行次数中一项或多项,更新缓存中该跳转指令的记录信息。
举例来说,For循环的程序代码如下:

相应的,计算资源在执行该For循环的程序代码时,具体执行如表1中的指令流:
表1
表1对应的跳转序列如表2所示:
表2
那么,计算资源在缓存中写入条件跳转指令的记录信息时,具体执行如下动作:
第1步,插入hash(12-3),X1,Y1;
第2步,插入hash(10-7),X2,Y2;
第3步,更新hash(10-7),X3,Y3;
第4步,更新hash(10-7),X4,Y4;
第11步,更新hash(10-7),X11,Y11;
第12步,更新hash(12-3),X12,Y12,并删除hash(10-7),X11,Y11;
第13步,插入hash(10-7),X13,Y13;
第14步,更新hash(10-7),X14,Y14;
第15步,更新hash(10-7),X15,Y15;
……
需要指出的是,为方便理解,将上述指令数和执行长度,均由X、Y来表示。进一步的,在第12步中,计算资源在写入“hash(12-3),X12,Y12”时,确定缓存中已经存在“hash(12-3),X1,Y1”,且“hash(12-3),X1,Y1”之后还包括其他的记录信息,即“hash(10-7),X11,Y11”,那么计算资源在更新“hash(12-3),X12,Y12”的同时,删除“hash(10-7),X11,Y11”。如此,有助于减少缓存中条件跳转指令的记录信息的数据量,且避免较为频繁地配置计算资源中的参数。
同理的,当记录信息中包括执行次数时,计算资源还可在每个记录信息中记录当前执行的条件跳转指令的执行次数。比如,上述第3步中,计算资源记录条件跳转指令的执行次数等于2;再比如,上述第12步中,计算资源记录条件跳转指令的执行次数等于2。
结合上述缓存的说明,解释说明计算资源在执行第一子程序并执行条件跳转指令之后,如何根据缓存中该多条记录信息,确定第一子程序是循环程序的实现方式。
计算资源在执行条件跳转指令之后,根据条件跳转指令的标识信息,确定缓存中是否存在有该条件跳转指令的记录信息。具体的,计算资源根据该条件跳转指令的标识信息,确定跳转标识,其中,跳转标识比如是条件跳转指令的标识信息,或者是条件跳转指令的标识信息的哈希。计算资源遍历缓存中的多条记录信息,若确定某条记录信息中包括该跳转标识,则确定缓存中包括该条件跳转指令的记录信息;若确定缓存中的多条记录信息中均未包括有该跳转标识,则确定缓存中不包括该条件跳转指令的记录信息。
进一步的,若计算资源确定缓存中存在条件跳转指令的记录信息,则根据缓存中条件跳转指令的记录信息确定第一子程序是否为循环程序,以及更新该缓存中条件跳转指令的记录信息。若计算资源确定缓存中不存在条件跳转指令的记录信息,则在缓存中增加该条件跳转指令的记录信息。如下分情况解释说明:
情况1,计算资源确定缓存中包括该条件跳转指令的记录信息。
计算资源根据缓存中已存在的该条件跳转指令的记录信息,确定该第一子程序被计算资源重复执行的次数大于第三预设值,进而确定第一程序是循环程序。
一种可能方式中,缓存中已存在有该条件跳转指令的记录信息,表明第一子程序被计算资源重复执行,计算资源确定第一子程序被计算资源重复执行的次数大于第三预设值,进而确定第一程序是循环程序,其中,第三预设值的取值为1。
再一种可能方式中,缓存中已存在的该条件跳转指令的记录信息中包括跳转标识和该条件跳转指令的执行次数,计算资源若确定该条件跳转指令的执行次数大于第三预设值,则确定第一程序是循环程序,其中,第三预设值的取值比如为2。
又一种可能方式中,缓存中已存在的该条件跳转指令的记录信息中包括跳转标识、指 令数(记为第二指令数)和执行长度(记为第二执行长度)。
其中,该第二指令数是计算资源在执行该条件跳转指令(即步骤401)之前一次执行该相同条件跳转指令时程序累计执行指令的总数。解释为,计算资源多次执行该条件跳转指令,比如当前一次(即步骤401)的执行是第N次,那么该第二指令数可以是计算资源在第N-1次执行该条件跳转指令时程序累计执行指令的总数。
第二执行长度是计算资源在执行该条件跳转指令(即步骤401)之前两次执行该相同条件跳转指令之间程序累计执行指令的总数。解释为,计算资源多次执行该条件跳转指令,比如当前一次(即步骤401)的执行是第N次,那么该第二执行长度是计算资源在第N-1次执行条件跳转指令、第N-2次执行条件跳转指令之间,程序累计执行指令的总数。
计算资源根据第二指令数和第二执行长度,确定第一程序是循环程序。具体可参见下述可能方式1或可能方式2。
可能方式1:
计算资源若确定第二执行长度为0(即该记录信息是计算资源在第一次执行该条件跳转指令时记录的),则确定第一子程序是循环程序。
计算资源若确定第二执行长度不为0(即该记录信息是计算资源在第m次执行该条件跳转指令时记录的,m大于1),则将第二指令数与第一指令数的差值,作为第一执行长度。计算资源在确定第一执行长度和第二执行长度之间的差值小于差值阈值的情况下,确定第一子程序是循环程序。反之,计算资源在确定第一执行长度和第二执行长度之间的差值大于或等于差值阈值的情况下,确定第一子程序不是循环程序。
可能方式2:
计算资源若确定第二执行长度为0(即该记录信息是计算资源在第一次执行该条件跳转指令时记录的),确定第一子程序不是循环程序。
计算资源若确定第二执行长度不为0(即该记录信息是计算资源在第m次执行该条件跳转指令时记录的,m大于1),则根据第二指令数和第一指令数,确定第一执行长度。在确定第一执行长度和第二执行长度之间的差值小于差值阈值的情况下,确定第一子程序是循环程序。反之,计算资源在确定第一执行长度和第二执行长度之间的差值大于或等于差值阈值的情况下,确定第一子程序不是循环程序。
可以理解,在可能方式1中,当计算资源连续执行两次第一子程序时,计算资源即认为第一子程序是循环程序;在可能方式2中,当计算资源连续执行三次第一子程序时,计算资源认为第一子程序是循环程序。如此,可能方式2相比于可能方式1的准确性更高。
此外,计算资源还根据条件跳转指令的跳转标识、第一指令数、第一执行长度、第一执行次数中一项或多项,更新至缓存中条件跳转指令的记录信息。示例性的,计算资源根据条件跳转指令的跳转标识,将第一指令数和第一执行长度,更新至缓存中该条件跳转指令对应的记录信息中,即将该条件跳转指令对应的记录信息中的第二指令数更新为第一指令数,以及将该条件跳转指令对应的记录信息中的第二执行长度更新为第一执行长度。
在一种可能方式中,若条件跳转指令的记录信息中包括执行次数,计算资源还可以先更新记录信息中的执行次数,在更新之后的执行次数大于第三预设值时,确定第一子程序被计算资源重复执行的次数大于第三预设值,进而确定第一程序是循环程序。
情况2,计算资源确定缓存中不包括该条件跳转指令的记录信息。
计算资源确定第一子程序不是循环程序。进一步的,计算资源还根据条件跳转指令的跳转标识、第一指令数、第一执行长度中一项或多项,在缓存中新增该条件跳转指令的记录信息。示例性的,计算资源将该条件跳转指令的跳转标识、第一指令数、第一执行长度和第一执行次数中一项或多项,作为该条件跳转指令的记录信息新增至缓存中。示例性的,第一执行长度取值为0,第一执行次数取值为1。
需要补充的是,计算资源确定缓存中是否包括该条件跳转指令的记录信息,也可以理解为,计算资源尝试从缓存中获取该条件跳转指令的记录信息,进而基于是否获取到该条件跳转指令的记录信息确定第一子程序是否为循环程序。具体的,计算资源若能从缓存中获取该条件跳转指令的记录信息,则根据获取到的该条件跳转指令的记录信息,确定该第一子程序被计算资源重复执行的次数大于第三预设值,进而确定第一程序是循环程序。计算资源若未能从缓存中获取该条件跳转指令的记录信息,则确定第一子程序不是循环程序。
基于上述情况1中可能方式1以及情况2,如图5为本申请示例性提供一种计算资源确定循环程序的具体实现方式的流程示意图,可以理解,图5是图4的一种具体实现方式。其中,图5中的相关术语可参见上述图4相关实施例中描述。
步骤501,计算资源在执行完第一子程序之后,执行条件跳转指令。
步骤502,计算资源根据条件跳转指令的跳转标识,确定缓存中是否包括该条件跳转指令的记录信息。具体的,计算资源若确定缓存中不包括条件跳转指令的记录信息,则执行步骤503,计算资源若确定缓存中包括条件跳转指令的记录信息,则执行步骤504。
步骤503,计算资源在缓存中新增该条件跳转指令的记录信息。
步骤504,计算资源确定第二执行长度是否为0。具体的,计算资源若确定第二执行长度为0,则执行步骤505,计算资源若确定第二执行长度不为0,则执行步骤506。
步骤505,计算资源更新缓存中该条件跳转指令的记录信息,确定第一子程序是循环程序。
步骤506,计算资源确定第一执行长度和第二执行长度之间的差值是否小于差值阈值。具体的,计算资源若确定第一执行长度和第二执行长度之间的差值小于差值阈值,则执行步骤505;计算资源若确定第一执行长度和第二执行长度之间的差值大于或等于差值阈值,则执行步骤507。
步骤507,计算资源更新缓存中该条件跳转指令的记录信息,确定第一子程序不是循环程序。
基于上述情况1中可能方式2以及情况2,如图6为本申请示例性提供一种计算资源确定循环程序的具体实现方式的流程示意图,可以理解,图6是图4的再一种具体实现方式。其中,图6中的相关术语可参见上述图4相关实施例中描述。
步骤601,计算资源在执行完第一子程序之后,执行条件跳转指令。
步骤602,计算资源根据跳转标识,确定缓存中是否包括条件跳转指令的记录信息。
具体的,计算资源若确定缓存中不包括条件跳转指令的记录信息,则执行步骤603,计算资源若确定缓存中包括条件跳转指令的记录信息,则执行步骤604。
步骤603,计算资源在缓存中新增该条件跳转指令的记录信息。
步骤604,计算资源确定第二执行长度是否为0。
具体的,计算资源若确定第二执行长度为0,则执行步骤605,计算资源若确定第二执行长度不为0,则执行步骤606。
步骤605,计算资源更新缓存中该条件跳转指令的记录信息。
步骤606,计算资源确定第一执行长度和第二执行长度之间的差值是否小于差值阈值。具体的,计算资源若确定第一执行长度和第二执行长度之间的差值小于差值阈值,则执行步骤608;计算资源若确定第一执行长度和第二执行长度之间的差值大于或等于差值阈值,则执行步骤607。
步骤607,计算资源更新缓存中该条件跳转指令的记录信息,确定第一子程序不是循环程序。
步骤608,计算资源更新缓存中该条件跳转指令的记录信息,确定第一子程序是循环程序。
可选的,计算资源在执行(或获取,或接收)条件跳转指令之后,还可先确定条件跳转指令是否指向小循环,其中,小循环指的是单次循环执行的指令数小于第二预设值的循环。可选的,计算资源可以根据微指令模块(比如LSD)中指示,确定程序是否处于小循环状态中,进而确定该条件跳转指令是否指向小循环。计算资源若确定条件跳转指令指向小循环,则将该条件跳转指令过滤掉,无需再执行上述步骤402和步骤403。也可以理解,程序处于循环状态,该循环状态对应的单次循环的指令数大于第二预设值。如此,有助于减少缓存中条件跳转指令的记录信息的数据量,从而节省计算资源的缓存空间。此外,还有助于避免较为频繁地调整计算资源的配置参数,减少计算资源的计算功耗。
需要补充的是,在多段相似的子程序为程序中的被多次执行的循环程序的情况下,由于循环次数有限,计算资源还需要尽早地识别出程序已经/即将退出循环(比如结束循环执行的子程序),以避免在程序已经退出循环的情况下,计算资源仍采用该循环适用的配置参数,来运行该程序。如下仍以程序中的第一子程序为例说明。
在第一个可能方式中,计算资源确定第一子程序是循环程序之后,还可根据第一指令数和预设执行长度,确定指令数阈值。示例性的,指令数阈值等于第一指令数和预设执行长度的加和。可选的,预设执行长度是预先设置的,预设执行长度大于第一执行长度,也大于第二执行长度。可选的,预设执行长度是第一执行长度,或是第二执行长度。
可选的,计算资源将条件跳转指令的跳转标识和指令数阈值,一起写入至缓存中,从而计算资源可根据缓存中的跳转标识和指令数阈值,确定程序是否退出循环,即不再循环执行第一子程序。一种可能方式中,计算资源将该指令数阈值写入至该跳转标识对应的记录信息(即步骤403的记录信息)中,可以理解,该跳转标识对应的记录信息中不仅包括跳转标识对应的指令数、执行长度和执行次数中一项或多项,还包括跳转标识对应的指令数阈值。再一种可能方式中,计算资源将条件跳转指令的跳转标识和指令数阈值单独作为一条记录信息写入缓存中。
计算资源若确定在程序累计执行指令的总数超过指令数阈值时,仍未再次执行新的该条件跳转指令,则确定程序退出循环,即不再循环执行第一子程序。计算资源若确定在程序累计执行指令的总数超过指令数阈值之前,再次执行新的该条件跳转指令,则确定程序仍处于该循环中,即仍在循环执行第一子程序。
在第二个可能方式中,计算资源确定第一子程序是循环程序之后,还可将预设执行长度(或称为预设指令数)写入至预设寄存器中,计算资源每执行一条指令,则指示该预设寄存器中的指令数减1。如此,当预设寄存器中的指令数取值为0时,计算资源若仍未再次执行新的该条件跳转指令,则确定程序退出循环,即不再循环执行第一子程序。计算资源若在预设寄存器中的指令数取值为0之前,再次执行新的该条件跳转指令,则确定程序仍处于该循环中,即仍在循环执行第一子程序。进一步的,计算资源每执行一次该条件跳转指令,则可将该预设指令数刷新至该预设寄存器中。
可选的,计算资源在确定程序退出循环之后,进一步确定计算资源的配置参数为默认配置参数。在一种可能实现方式中,计算资源在确定当前的配置参数为默认配置参数时,对当前的配置参数不作调整;计算资源在确定当前的配置参数不是默认配置参数时,将当前的配置参数调整为默认配置参数。
可选的,计算资源在确定程序仍在循环执行第一子程序之后,一个示例中,计算资源还获取第一子程序的运行特征,根据运行特征确定计算资源的目标配置参数;另一个示例中,计算资源在循环执行第一子程序的过程中,可不再检测每次循环中第一子程序的运行特征,而是在确定不再循环执行第一子程序之后,调整计算资源的配置参数为默认配置参数,该后一个示例有助于降低计算资源运行程序过程中的功耗或复杂度。
需要补充的是,在计算资源是处理器时,该处理器中可包括多个处理器核。
在一种可能方式中,每个处理器核在运行程序的过程中,该处理器核执行上述方法实施例中方法确定本处理器核的目标配置参数,即目标配置参数是处理器核粒度的。该处理器核再将目标配置参数配置于该处理器核中。
举例来说,计算资源中包括处理器核1至处理器核5。其中,处理器核1在运行程序的过程中,处理器核1还执行上述方法实施例中方法以确定处理器核1的目标配置参数1。该处理器核1再将目标配置参数1配置于该处理器核1中。同理的,其他处理器核也可以根据各自运行的程序确定各自的目标配置参数。
在另外一种可能方式中,多个处理器核运行相同程序,一个处理器核用于执行上述方法实施例中方法确定该相同程序对应的目标配置参数,即目标配置参数是程序粒度的。进一步的,该用于确定目标配置参数的处理器核可以是运行程序的多个处理器核中的一个,也可以是计算资源中独立于该多个处理器核的其他处理器核。该处理器核还能再将目标配置参数分别配置于该多个处理器核中。
举例来说,计算资源中包括处理器核1至处理器核5。一个例子中,处理器核1至处理器核5运行相同程序,且处理器核1在运行该程序的过程中,还执行上述方法实施例中方法,以确定该相同程序对应的目标配置参数,并将该相同程序对应的目标配置参数配置于该处理器核1至处理器核5中。再一个例子中,处理器核2至处理器核5运行相同程序,处理器核1执行上述方法实施例中方法,以确定该相同程序对应的目标配置参数,并将该相同程序对应的目标配置参数配置于该处理器核2至处理器核5中。
基于上述内容和相同构思,图8为本申请的提供的一种可能的处理设备的结构示意图。该处理设备可以用于实现上述方法实施例的功能,因此具备上述方法实施例的有益效果。
如图8所示,该处理设备800包括参数确定模块801和配置模块802。
该运行程序的装置中包括参数确定模块801和配置模块802。
参数确定模块801,用于在确定处理设备中的计算资源正在运行的程序中包括多段相似的子程序时,获取多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序。以及,根据运行特征确定计算资源的配置参数。
配置模块802,用于使用该确定出的配置参数来配置计算资源。
在一种可能的实现方式中,多段相似的子程序为程序中的被多次执行的循环程序,每一段子程序对应于循环程序的一次或多次循环。
在一种可能的实现方式中,循环程序的指令数大于第二预设值。
在一种可能的实现方式中,装置中还包括检测模块803,检测模块803用于确定计算资源中正在运行的程序中包括多段相似的子程序。具体的,检测模块803在确定程序中的第一子程序被重复执行的次数大于第三预设值时,确定第一子程序为循环程序。
在一种可能的实现方式中,检测模块803在确定程序中的第一子程序被重复执行的次数是否大于第三预设值时,具体用于:在执行完成第一子程序并执行条件跳转指令之后,确定是否存在条件跳转指令的记录信息;若确定存在条件跳转指令的记录信息,则根据条件跳转指令的记录信息,确定第一子程序被重复执行的次数是否大于第三预设值;若确定尚未存在条件跳转指令的记录信息,则增加该条件跳转指令的记录信息。
在一种可能的实现方式中,在检测模块803执行条件跳转指令之后,还用于:确定条件跳转指令是否指向小循环,其中,小循环指的是单次循环执行的指令数小于第二预设值的循环。若该条件跳转指令指向小循环,则将该条件跳转指令过滤掉;若该条件跳转指令未指向小循环,则进一步确定是否存在条件跳转指令的记录信息。
在一种可能的实现方式中,记录信息记录在计算资源的预设的一段缓存中。
在一种可能的实现方式中,检测模块803在确定是否存在条件跳转指令的记录信息时,具体用于:根据该条件跳转指令的标识信息确定跳转标识,其中,跳转标识是条件跳转指令的标识信息,或是条件跳转指令的标识信息的哈希。遍历缓存中的多条记录信息,若确定某条记录信息中包括该跳转标识,则确定缓存中包括该条件跳转指令的记录信息;若确定缓存中的多条记录信息中均未包括有该跳转标识,则确定缓存中不包括该条件跳转指令的记录信息。可选的,条件跳转指令的标识信息是从分支记录模块中获取的。标识信息中包括起始位置和/或目标位置,或者,包括起始位置和/或目标位置的哈希。
在一种可能的实现方式中,检测模块803在根据记录信息,确定第一子程序被重复执行的次数大于第三预设值时,具体用于:根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数大于第三预设值,其中,记录信息中指令数用于指示在上一次执行条件跳转指令时,程序累计执行的指令数;记录信息中执行长度是程序分别在前两次执行条件跳转指令时累计执行的指令数的差值。
在一种可能的实现方式中,第三预设值等于2,检测模块803在根据记录信息中的指令数和执行长度,确定第一子程序被重复执行的次数大于第三预设值时,具体用于:将执行该条件跳转指令时程序执行的指令数作为第一指令数,以及将第一指令数和记录信息中指令数的差值作为第一执行长度。若确定记录信息中的执行长度不为0,且第一执行长度与记录信息中的执行长度之间的差值小于差值阈值,则确定第一子程序被重复执行的次数 大于2。在一种可能的实现方式中,检测模块803在确定第一子程序被重复执行的次数大于第三预设值之后,还用于:根据第一指令数和第一执行长度更新记录信息。
在一种可能的实现方式中,检测模块803在确定第一子程序为循环程序之后,还用于:根据在执行条件跳转指令时程序累计执行的指令数(即第一指令数),更新记录信息中的指令数。根据更新之后的记录信息中的第一指令数和预设执行长度,确定指令数阈值。当程序执行指令的指令数达到指令数阈值时,若尚未再次执行条件跳转指令,则确定循环程序已退出;参数确定模块801还用于:确定计算资源的配置参数为默认配置参数。
在一种可能的实现方式中,检测模块803在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体用于:根据记录信息中的条件跳转指令的执行次数,确定条件跳转指令的执行次数是否大于第三预设值时。其中,当条件跳转指令的执行次数是否大于第三预设值时,即确定第一子程序被重复执行的次数大于第三预设值。
在一种可能的实现方式中,检测模块803在根据记录信息,确定第一子程序被重复执行的次数是否大于第三预设值时,具体用于:更新记录信息中条件跳转指令的执行次数;根据更新之后的条件跳转指令的执行次数,确定条件跳转指令的执行次数是否大于第三预设值。
在一种可能的实现方式中,参数确定模块801在确定计算资源运行第一子程序的一次循环中的运行特征时,具体用于,获取在执行该条件跳转指令时计算资源运行程序的运行特征;以及,获取在执行该条件跳转指令之前的一次执行条件跳转指令时程序的运行特征;根据该两次获取的运行特征,确定计算资源运行第一子程序的一次循环中的运行特征。
在一种可能的实现方式中,参数确定模块801在根据运行特征确定计算资源的配置参数时,具体用于,从多个预设特征中确定与运行特征相匹配的目标预设特征;将目标预设特征对应的预设配置参数,确定为计算资源的配置参数。
示例性的,计算资源的配置参数包括预取策略,预取策略包括丢失缓存行的预取策略、整数数据访问的预取策略、预取算法的激进程度等。
在一种可能的实现方式中,参数确定模块801在从多个预设特征中确定与运行特征相匹配的目标预设特征时,具体用于,对A维的运行特征执行降维处理,得到B维的运行特征;根据B维的运行特征分别与多个B维的预设特征之间的匹配程度,从多个B维的预设特征中选择出匹配程度最高的预设特征,作为目标预设特征,其中,A、B均为正整数,且B小于A。
在一种可能的实现方式中,在参数确定模块801确定B维的运行特征与任一个B维的预设特征之间的匹配程度时,具体用于,针对B维中的任一个维度,确定该维度对应的运行特征和预设特征之间的匹配程度;根据B维中每个维度对应匹配程度,确定B维的运行特征和该B维的预设特征之间的匹配程度。在一种可能的实现方式中,在B维的预设特征中一个维度中,该维度包括多个比特位,多个比特位中部分比特位的取值被掩码掩盖。在参数确定模块801针对该维度确定该维度对应的运行特征和预设特征之间的匹配程度时,具体可以基于模糊匹配的方式确定。
在一种可能的实现方式中,若参数确定模块801未从多个预设特征中确定与运行特征相匹配的目标预设特征,则将默认配置参数作为计算资源的配置参数。
在一种可能的实现方式中,计算资源的配置参数中包括计算资源中配置寄存器的地址和配置寄存器值;配置模块802在将计算资源的配置参数配置到计算资源时,具体用于, 将配置寄存器值写入至配置寄存器的地址对应的配置寄存器中。
在一种可能的实现方式中,运行特征至少包括如下任一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区的丢失率、缓存丢失率、预取命中率。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成为一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
该集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台终端设备(可以是个人计算机,手机,或者网络设备等)或处理器(processor)执行本申请各个实施例该方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
上述各个附图对应的流程的描述各有侧重,某个流程中没有详述的部分,可以参见其他流程的相关描述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可在计算资源中启动两个程序,一个是用于实现本申请方法的程序(记为第一程序),另一个是计算资源运行的程序(即前面方法实施例中的程序,记为第二程序)。其中,计算资源可以交替运行第一程序和第二程序,比如在第一时段运行第一程序,在第二时段运行第二程序,其中,第一时段和第二时段不重叠,可以理解,计算资源可在一个时刻来运行其中一个程序。而由于第一程序占用的计算资源很小,即第一时段远远短于第二时段,所以计算资源交替运行第一程序和第二程序时,对第二程序的运行效果的影响可忽略不计。如此,可通过计算资源交替运行第一程序和第二程序,实现计算资源通过运行第一程序,以确定第二程序中包括的多段相似的子程序中已运行或者当前运行的子程序的运行特征,进而根据确定出的运行特征确定计算资源的配置参数,以使得计算资源能够更高效的运行该第二程序。
进一步的,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括计算机程序指令,在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例图3至图6相关方法实施例中的流程或功能。
计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如SSD)。
也可以理解,检测模块803用于执行图3中步骤301,以及图4至图6相关方法实施例中步骤;参数确定模块801用于执行图3中步骤302和步骤303;配置模块802用于执行图3中步骤304。进一步的,在参数确定模块801和检测模块803通过硬件实现时,即参数确定模块801或检测模块803中包括硬件电路时,检测模块803可在每次执行条件跳转指令之后,若确定程序处于循环中,则向参数确定模块801发送检测信号;若确定程序未处于循环中,则向参数确定模块801发送开始信号。
实现方式具体可以是如下示例A至示例C中的任一个。
在示例A中,检测模块803向参数确定模块801发送一次开始信号和一次检测信号。相应的,参数确定模块801响应于开始信号,重置性能监控模块中的特征计数器(即计数清零);参数确定模块801响应于检测信号,确定性能监控模块中的特征计数器的取值。
在示例B中,检测模块803向参数确定模块801发送一次开始信号和多次检测信号。相应的,参数确定模块801响应于开始信号,重置性能监控模块中的特征计数器(即计数清零);参数确定模块801响应于第一次检测信号,确定性能监控模块中的特征计数器的取值,以及参数确定模块801响应于第一次检测信号之后的每个检测信号,记录当前性能监控模块中的特征计数器的取值,并确定当前性能监控模块中的特征计数器的取值与上一次记录的特征计数器的取值的差值。
此外,在上述示例A和示例B中,参数确定模块801还可以响应于开始信号,确定性能监控模块中的特征计数器的取值(即不重置性能监控模块中的特征计数器),进而在参数确定模块801接收到第一次检测信号时,响应于检测信号,确定性能监控模块中的特征计数器的取值与上一次的特征计数器的取值的差值。
在示例C中,检测模块803向参数确定模块801发送多次检测信号(即不发送开始信号)。相应的,参数确定模块801响应于第一次检测信号,确定性能监控模块中的特征计数器的取值,以及参数确定模块801响应于第一次检测信号之后的每个检测信号,记录当前性能监控模块中的特征计数器的取值,并确定当前性能监控模块中的特征计数器的取值与上一次记录的特征计数器的取值的差值。
如下结合示例A提供两个实际应用中的例子:例子1,性能监控模块中的某个特征计数器用于确定缓存丢失率,当参数确定模块801接收到开始信号之后,重置该特征计数器,当参数确定模块801接收到检测信号之后,确定该特征计数器的取值,从而根据该开始信号和检测信号之间的该特征计数器的计数,确定缓存丢失率。例子2,性能监控模块中的某个特征计数器用于确定预取命中率,当参数确定模块801接收到开始信号之后,重置该特征计数器,当参数确定模块801接收到检测信号之后,确定该特征计数器的取值,从而根据该开始信号和检测信号之间的该特征计数器的计数,确定预取命中率。
可以理解的是,开始信号和检测信号仅表示一种动作触发机制。示例性的,开始信号为检测模块803与参数确定模块801之间信号线上的一个脉冲,检测信号为检测模块803与参数确定模块801之间信号线上的另外一个脉冲。再示例性的,开始信号和检测信号为电平变化,比如,开始信号是从0到1的电平变化,检测信号是从1到0的电平变化。
进一步的,结合图5和示例B举例,检测模块803在第1次检测到条件跳转指令时,向参数确定模块801发送开始信号,以及在第2次检测到条件跳转指令时,向参数确定模块801发送检测信号,如此,参数确定模块801确定第1次和第2次条件跳转指令之间的 计数值,该计数值可用于指示程序在一次循环中的运行特征。此外,还可以理解,当检测模块803在第m次检测到条件跳转指令时(此处m大于2),还向参数确定模块801发送检测信号,相应的,参数确定模块801确定第m-1次和第m次条件跳转指令之间的计数值,进而确定第m-1次和第m次条件跳转指令之间的一次循环中程序的运行特征。
结合图6和示例B举例,检测模块803在第2次检测到条件跳转指令时,向参数确定模块801发送开始信号,以及在第3次检测到条件跳转指令时,向参数确定模块801发送检测信号,如此,检测模块803不仅能够确定两次条件跳转指令之间的计数值,还能够提高循环检测的准确率。此外,还可以理解,在上述图6中,当检测模块803在第m次检测到条件跳转指令时(此处m大于3),还向参数确定模块801发送检测信号,相应的,参数确定模块801确定第m-1次和第m次条件跳转指令之间的计数值,进而确定第m-1次和第m次条件跳转指令之间的一次循环中程序的运行特征。
此外,在上述示例B中,还可能有如下多个方式中的一种或多种:方式1,检测模块803在检测到多次条件跳转指令之后,向参数确定模块801发送开始信号;方式2,检测模块803在向参数确定模块801发送开始信号之后,还可以每检测到多次条件跳转指令之后,向参数确定模块801发送一次检测信号,其中,每两次发送检测信号之间,条件跳转指令的执行次数可以相同或不同。其中,示例A与示例C与上述示例B类似。
结合上述图5和示例A举例,检测模块803在第1次检测到条件跳转指令时,向参数确定模块801发送开始信号,以及在第3次检测到条件跳转指令时,向参数确定模块801发送检测信号,如此,参数确定模块801确定第1次和第3次条件跳转指令之间的计数值,该计数值可用于指示程序在该两次循环中的运行特征。
结合上述图6和示例A举例,检测模块803在第2次检测到条件跳转指令时,向参数确定模块801发送开始信号,以及在第5次检测到条件跳转指令时,向参数确定模块801发送检测信号,如此,参数确定模块801确定第2次和第5次条件跳转指令之间的计数值,该计数值可用于指示程序在该两次循环中的运行特征。
基于上述内容和相同构思,本申请提供一种处理设备,该处理设备中包括计算资源及与计算资源连接的存储器,存储器用于存储计算机程序,计算资源用于执行存储器中存储的计算机程序,以使得计算资源实现上述方法实施例中的方法。具体的,处理设备中包括一个或多个处理器,处理器中包括一个或多个处理器核,其中,处理器核能够在读取存储器中存储的计算机程序时,实现上述方法实施例中的方法。
基于上述内容和相同构思,本申请提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序或指令,处理设备中的计算资源运行计算机程序或指令以执行上述方法实施例中的方法。
基于上述内容和相同构思,本申请提供一种处理芯片,包括至少一个处理器核和接口;接口,用于为至少一个处理器核提供程序指令或者数据;至少一个处理器核用于执行程序行指令,以实现上述方法实施例中的方法。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先 后,各过程的执行顺序应以其功能和内在逻辑确定。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (27)

  1. 一种运行程序的方法,其特征在于,包括:
    当确定处理设备中的计算资源正在运行的程序中包括多段相似的子程序时,获取所述多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,所述多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序;
    根据所述运行特征确定所述计算资源的配置参数;
    使用所述计算资源的配置参数配置所述计算资源。
  2. 如权利要求1所述的方法,其特征在于,所述多段相似的子程序为所述程序中的被所述计算资源多次执行的循环程序,每一段子程序对应于所述循环程序的一次循环。
  3. 如权利要求2所述的方法,其特征在于,所述循环程序的指令数大于第二预设值。
  4. 如权利要求2所述的方法,其特征在于,所述确定处理设备中的计算资源正在运行的程序中包括多段相似的子程序,包括:
    当所述程序中的第一子程序被所述计算资源重复执行的次数大于第三预设值时,确定所述第一子程序为所述循环程序。
  5. 如权利要求4所述的方法,其特征在于,所述确定所述第一子程序为所述循环程序之前,还包括:
    在执行所述第一子程序之后,若执行条件跳转指令,则确定是否存在所述条件跳转指令的记录信息;
    若存在所述记录信息,则根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值;
    若尚未存在所述记录信息,则增加所述记录信息。
  6. 如权利要求5所述的方法,其特征在于,所述根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值,包括:
    更新所述记录信息中所述条件跳转指令的执行次数;
    根据更新之后的所述记录信息中所述条件跳转指令的执行次数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值。
  7. 如权利要求5所述的方法,其特征在于,所述根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值,包括:
    确定第一指令数,所述第一指令数是在执行所述条件跳转指令时所述程序累计执行的指令数;
    根据所述记录信息中的指令数和执行长度,以及所述第一指令数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值;
    其中,所述记录信息中的指令数用于指示在上一次执行所述条件跳转指令时,所述程序累计执行的指令数;
    所述记录信息中的执行长度是所述程序分别在前两次执行所述条件跳转指令时累计执行的指令数的差值。
  8. 如权利要求7所述的方法,其特征在于,所述第三预设值等于2;
    所述根据所述记录信息中的指令数和执行长度,以及所述第一指令数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值,包括:
    将所述第一指令数和所述记录信息中的指令数的差值,作为第一执行长度;
    在所述第一执行长度和所述记录信息中的执行长度的差值小于差值阈值时,确定所述第一子程序被重复执行的次数大于2。
  9. 如权利要求7所述的方法,其特征在于,所述确定所述第一子程序为所述循环程序之后,还包括:
    根据所述第一指令数,更新所述记录信息中的指令数;
    根据更新之后的所述记录信息中的所述第一指令数和预设执行长度,确定指令数阈值;
    当所述程序执行指令的指令数达到所述指令数阈值时,若尚未再次执行所述条件跳转指令,则确定所述循环程序已退出;
    确定所述计算资源的配置参数为默认配置参数。
  10. 如权利要求5所述的方法,其特征在于,所述记录信息记录在所述计算资源的预设的一段缓存中。
  11. 如权利要求1所述的方法,其特征在于,所述根据所述运行特征确定所述计算资源的配置参数,包括:
    根据所述运行特征,从多个预设特征中确定与所述运行特征相匹配的目标预设特征;
    将所述目标预设特征对应的预设配置参数,确定为所述计算资源的配置参数。
  12. 如权利要求1-11中任一项所述的方法,其特征在于,所述运行特征至少包括如下中任一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区的丢失率、缓存丢失率、预取命中率。
  13. 一种运行程序的装置,其特征在于,包括:
    参数确定模块,用于当确定处理设备的计算资源中正在运行的程序中包括多段相似的子程序时,获取所述多段相似的子程序中已运行或者当前运行的子程序的运行特征,其中,所述多段相似的子程序为运行特征的相似度大于或等于第一预设值的多段子程序,及根据所述运行特征确定所述计算资源的配置参数;
    配置模块,用于使用所述计算资源的配置参数配置所述计算资源。
  14. 如权利要求13所述的装置,其特征在于,所述多段相似的子程序为所述程序中的被所述计算资源多次执行的循环程序,每一段子程序对应于所述循环程序的一次循环。
  15. 如权利要求14所述的装置,其特征在于,所述循环程序的指令数大于第二预设值。
  16. 如权利要求14所述的装置,其特征在于,还包括:检测模块;
    所述检测模块用于:当所述程序中的第一子程序被所述计算资源重复执行的次数大于第三预设值时,确定所述第一子程序为所述循环程序。
  17. 如权利要求16所述的装置,其特征在于,所述检测模块在确定所述第一子程序为所述循环程序之前,还用于:
    在执行所述第一子程序之后,若执行条件跳转指令,则确定是否存在所述条件跳转指令的记录信息;
    若存在所述记录信息,则根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值;
    若尚未存在所述记录信息,则增加所述记录信息。
  18. 如权利要求17所述的装置,其特征在于,所述检测模块在根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值时,具体用于:
    更新所述记录信息中所述条件跳转指令的执行次数;
    根据更新之后的所述记录信息中所述条件跳转指令的执行次数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值。
  19. 如权利要求18所述的装置,其特征在于,所述检测模块在根据所述记录信息,确定所述第一子程序被重复执行的次数是否大于所述第三预设值时,具体用于:
    确定第一指令数,所述第一指令数是在执行所述条件跳转指令时所述程序累计执行的指令数;
    根据所述记录信息中的指令数和执行长度,以及所述第一指令数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值;
    其中,所述记录信息中的指令数用于指示在上一次执行所述条件跳转指令时,所述程序累计执行的指令数;
    所述记录信息中的执行长度是所述程序分别在前两次执行所述条件跳转指令时累计执行的指令数的差值。
  20. 如权利要求19所述的装置,其特征在于,所述第三预设值等于2;
    所述检测模块在根据所述记录信息中的指令数和执行长度,以及所述第一指令数,确定所述第一子程序被重复执行的次数是否大于所述第三预设值时,具体用于:
    将所述第一指令数和所述记录信息中的指令数的差值,作为第一执行长度;
    在所述第一执行长度和所述记录信息中的执行长度的差值小于差值阈值时,确定所述第一子程序被重复执行的次数大于2。
  21. 如权利要求19所述的装置,其特征在于,
    所述检测模块在确定所述第一子程序为所述循环程序之后,还用于:
    根据所述第一指令数,更新所述记录信息中的指令数;
    根据更新之后的所述记录信息中的所述第一指令数和预设执行长度,确定指令数阈值;
    当所述程序执行指令的指令数达到所述指令数阈值时,若尚未再次执行所述条件跳转指令,则确定所述循环程序已退出;
    所述参数确定模块还用于:
    确定所述计算资源的配置参数为默认配置参数。
  22. 如权利要求17所述的装置,其特征在于,所述记录信息记录在所述计算资源的预设的一段缓存中。
  23. 如权利要求13所述的装置,其特征在于,所述参数确定模块在根据所述运行特征确定所述计算资源的配置参数时,具体用于:
    根据所述运行特征,从多个预设特征中确定与所述运行特征相匹配的目标预设特征;
    将所述目标预设特征对应的预设配置参数,确定为所述计算资源的配置参数。
  24. 如权利要求13-23中任一项所述的装置,其特征在于,所述运行特征至少包括如下中任一项或多项:处理器核每一时钟周期内所执行的指令数、指令转换后备缓冲区的丢失率、缓存丢失率、预取命中率。
  25. 一种处理设备,其特征在于,包括计算资源及与所述计算资源连接的存储器,所述存储器用于存储计算机程序,所述计算资源用于执行所述存储器中存储的计算机程序,以使得所述计算资源执行如权利要求1至12中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,处理设备中的计算资源运行所述计算机程序或指令以执行如权利要求1至12中任一项所述的方法。
  27. 一种处理芯片,其特征在于,包括至少一个处理器核和接口;
    所述接口,用于为所述至少一个处理器核提供程序指令或者数据;
    所述至少一个处理器核用于执行所述程序行指令,以实现如权利要求1至12中的计算资源所执行的任一项所述的方法。
PCT/CN2023/100498 2022-06-25 2023-06-15 一种运行程序的方法及装置 WO2023246625A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210731665 2022-06-25
CN202210731665.4 2022-06-25
CN202211118557.6A CN117331611A (zh) 2022-06-25 2022-09-13 一种程序运行方法及装置
CN202211118557.6 2022-09-13

Publications (1)

Publication Number Publication Date
WO2023246625A1 true WO2023246625A1 (zh) 2023-12-28

Family

ID=89292150

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100498 WO2023246625A1 (zh) 2022-06-25 2023-06-15 一种运行程序的方法及装置

Country Status (2)

Country Link
CN (1) CN117331611A (zh)
WO (1) WO2023246625A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2852890A1 (en) * 2012-06-27 2015-04-01 Qatar Foundation An arrangement and method for use in managing resources of a plurality of computing devices
CN105718364A (zh) * 2016-01-15 2016-06-29 西安交通大学 一种云计算平台中计算资源能力动态评估方法
CN110858160A (zh) * 2018-08-24 2020-03-03 阿里巴巴集团控股有限公司 资源调度方法及装置、存储介质和处理器
CN111625362A (zh) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 一种计算资源调度方法、装置及相关组件

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2852890A1 (en) * 2012-06-27 2015-04-01 Qatar Foundation An arrangement and method for use in managing resources of a plurality of computing devices
CN105718364A (zh) * 2016-01-15 2016-06-29 西安交通大学 一种云计算平台中计算资源能力动态评估方法
CN110858160A (zh) * 2018-08-24 2020-03-03 阿里巴巴集团控股有限公司 资源调度方法及装置、存储介质和处理器
CN111625362A (zh) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 一种计算资源调度方法、装置及相关组件

Also Published As

Publication number Publication date
CN117331611A (zh) 2024-01-02

Similar Documents

Publication Publication Date Title
US9164676B2 (en) Storing multi-stream non-linear access patterns in a flash based file-system
CN108920387B (zh) 降低读延迟的方法、装置、计算机设备及存储介质
US6782454B1 (en) System and method for pre-fetching for pointer linked data structures
US10534696B1 (en) Systems and methods for improving comparative performance test results of mobile applications
US7941633B2 (en) Hash optimization system and method
US20160239432A1 (en) Application-layer managed memory cache
US9990293B2 (en) Energy-efficient dynamic dram cache sizing via selective refresh of a cache in a dram
US10339021B2 (en) Method and apparatus for operating hybrid storage devices
KR20170042593A (ko) 파일 시스템에서의 플러싱 기법
US10489296B2 (en) Quality of cache management in a computer
WO2018090255A1 (zh) 内存访问技术
US9558123B2 (en) Retrieval hash index
US20160239423A1 (en) Managed memory cache with application-layer prefetching
US9792228B2 (en) Enhancing lifetime of non-volatile cache by injecting random replacement policy
WO2012024054A2 (en) Systems and methods for efficient sequential logging on caching-enabled storage devices
US20180059985A1 (en) Dynamic management of relationships in distributed object stores
US10261905B2 (en) Accessing cache with access delay reduction mechanism
WO2023246625A1 (zh) 一种运行程序的方法及装置
US11449428B2 (en) Enhanced read-ahead capability for storage devices
CN116340203A (zh) 数据预读取方法、装置、处理器及预取器
TWI639117B (zh) 微控制器和相關的記憶體管理方法
US7707378B2 (en) DDR flash implementation with hybrid row buffers and direct access interface to legacy flash functions
US11977488B2 (en) Cache prefetching method and system based on K-Truss graph for storage system, and medium
US11669455B2 (en) Systems and methods for profiling host-managed device memory
CN112988074B (zh) 一种存储系统管理软件适配方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826274

Country of ref document: EP

Kind code of ref document: A1