CN110197026B

CN110197026B - Processor core optimization method and system based on near-threshold calculation

Info

Publication number: CN110197026B
Application number: CN201910449741.0A
Authority: CN
Inventors: 王晶; 梁伟伟; 张伟功
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2023-03-31
Anticipated expiration: 2039-05-28
Also published as: CN110197026A

Abstract

The invention discloses a processor core optimization method and system based on near threshold calculation. The method comprises the following steps: acquiring a plurality of voltage-approximation degree data sets; taking a plurality of groups of voltage-approximation degree data sets as the input of the processor core to obtain a performance predicted value, an energy consumption predicted value and an output quality predicted value corresponding to each group of voltage-approximation degree data sets; taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and solving a target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set; and determining the voltage in the optimal voltage-approximation degree data set as the optimal voltage in a near threshold calculation state, wherein the approximation degree value in the optimal voltage-approximation degree data set is used as the optimal approximation degree. The invention can automatically select the voltage level and the approximation degree to obtain the three-dimensional optimization effect with optimal energy consumption, performance and output quality, and has high reliability.

Description

Processor core optimization method and system based on near-threshold calculation

Technical Field

The invention relates to the technical field of optimization of processor core systems, in particular to a processor core optimization method and system based on near-threshold calculation.

Background

The advent of "power walls" has been a challenge that has hindered the development of computers. As manufacturing processes continue to improve, there is a large amount of unused "dark silicon" in the chip in order to keep power consumption within acceptable limits, which in turn can cause wall utilization problems. There is data showing that 52% of the chip area is under the "dark silicon" state at 10nm fabrication process. To address the "wall-using" problem, researchers have proposed near-threshold computation techniques (NTCs) in which the transistors all operate in a near-threshold range of voltages. In the near-threshold computation state, a good compromise between performance and power consumption can be obtained. For example, near-threshold calculations have little loss of performance compared to super-threshold calculations (STCs) for the same power savings. For example, in the approximate calculation state, a 50% energy saving is obtained, which results in a 20% performance loss, but in the super-threshold calculation condition, the super-threshold calculation results in a higher energy loss to obtain the same energy saving.

However, near-threshold calculations present new reliability challenges, which are particularly evident as fabrication processes advance. Process variations affect the basic characteristics of the devices in the chip, which is already an unavoidable problem in the industry. This process variation is particularly pronounced in the case of near-threshold calculations, mainly because the average error rate per cell increases, while the variation across the chip increases. To address the reliability issue, researchers have proposed many fault tolerance techniques, such as error correction, reconstruction, and hardware redundancy. The purpose of these techniques is to completely eliminate errors, which entails an unavoidable additional fault-tolerance overhead.

Many applications today, such as pattern recognition, data mining, and speech recognition, have good fault tolerance characteristics in their own right. In particular, imprecise calculations and data are acceptable for these applications. For example, search results may be acceptable, if not completely consistent with the search content, in a search engine; since the human self has limited perception capabilities and can skip pictures in some videos, people are only concerned about the results of these applications and they do not care that the intermediate process is not performed correctly by a hundred percent. Therefore, there is no need to correct the error when it occurs in these applications, which reduces unnecessary additional performance and time overhead due to fault tolerance. These applications are not sensitive to errors, so they are friendly to near-threshold calculations, which can alleviate the problem of reliability of near-threshold calculations. For the fault tolerant nature of these applications, researchers have proposed hardware and software level approximations. These approximations can artificially introduce errors but their impact on the final output quality is limited and the loss of these output qualities is well within the user's acceptance.

At present, in a processor core optimization method, the following methods are generally adopted: 1) The voltage regulation technique considers how the voltage is regulated to achieve an improvement in energy efficiency. The method only considers voltage, does not consider selection of an approximation technology, and does not consider three-dimensional optimization of energy consumption, performance and output quality, so that the reliability is low. For example, dynamic Voltage Frequency Scaling (DVFS), which dynamically adjusts the operating frequency and voltage of a chip (for the same chip, the higher the frequency, the higher the voltage required) according to different requirements of an application program run by the chip on computing power, thereby achieving the purpose of saving energy, is a popular technology that only considers the voltage adjustment and cannot obtain a multidimensional optimization effect. 2) Considering the variation of the output accuracy under different approximation degrees, there is no method for automatically selecting the appropriate combination of approximation degree and voltage. Therefore, the existing optimization methods usually implement single-dimensional optimization, and implement optimization by completely splitting the near-threshold calculation technology and the approximate calculation technology, and lack a three-dimensional optimization effect which can be automatically adjusted to select the voltage level and the approximation degree to obtain the best combination of energy consumption, performance and output quality.

Disclosure of Invention

Based on this, it is necessary to provide a processor core optimization method and system based on near-threshold calculation, so as to realize automatic selection of voltage level and approximation degree, ensure that the processor core operates under the condition of optimal energy consumption, performance and output quality, realize multidimensional optimization, and improve the reliability of the processor core optimization method.

In order to achieve the purpose, the invention provides the following scheme:

a processor core optimization method based on near threshold calculation comprises the following steps:

acquiring a plurality of voltage-approximation degree data sets; each of the voltage-approximation degree data sets includes a voltage and a corresponding approximation degree value;

taking a plurality of groups of voltage-approximation degree data groups as the input of a processor core to obtain a performance predicted value, an energy consumption predicted value and an output quality predicted value corresponding to each group of voltage-approximation degree data groups;

constructing a target optimization function;

taking performance predicted values, energy consumption predicted values and output quality predicted values corresponding to all the voltage-approximation degree data sets as inputs, and solving the target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set;

determining the voltage in the optimal voltage-approximation degree data set as the optimal voltage in a near threshold calculation state, wherein the approximation degree value in the optimal voltage-approximation degree data set is used as the optimal approximation degree; the processor core operates at the optimal voltage and the optimal approximation degree.

Optionally, the obtaining a performance predicted value, an energy consumption predicted value, and an output quality predicted value corresponding to each voltage-approximation degree data set by using the plurality of voltage-approximation degree data sets as input of the processor core specifically includes:

taking the multiple voltage-approximation degree data groups as the input of a performance predictor, and obtaining the performance predicted value corresponding to each voltage-approximation degree data group by adopting an approximation calculation method

IPS _i ＝Av _i +ΔIPS _i ，

Wherein v is _i Representing voltages in the ith set of voltage-approximation degree data sets, A being a constant, A depending on the configuration of the processor core and the application executing on the processor core, Δ IPS _i Representing the influence degree of the approximate calculation method on the performance;

and taking the multiple groups of voltage-approximation degree data groups as the input of an energy consumption predictor, and obtaining energy consumption predicted values corresponding to each group of voltage-approximation degree data groups by adopting an approximation calculation method

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

Wherein, beta _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i Representing the approximation degree value in the ith group of voltage-approximation degree data set, wherein D is the influence degree of the approximation calculation method on the energy consumption;

and taking the multiple groups of voltage-approximation degree data groups as the input of an output quality predictor, and obtaining the output quality predicted value corresponding to each group of voltage-approximation degree data groups by adopting an approximation calculation method and a fault injection method.

Optionally, the obtaining of the output quality prediction value corresponding to each group of the voltage-approximation degree data sets by using the multiple groups of the voltage-approximation degree data sets as inputs of an output quality predictor by using an approximation calculation method and a fault injection method specifically includes:

taking the multiple groups of voltage-approximation degree data groups as the input of an output quality predictor, and classifying instructions in an application program executed on the processor core to obtain multiple instruction classes; each instruction category comprises a plurality of instructions with similar propagation paths;

sampling each instruction type by adopting an approximate calculation method to obtain a plurality of sampling instructions;

injecting a fault into each sampling instruction by adopting a fault injection method to obtain a sampling fault instruction;

calculating an error value according to each sampling instruction and the corresponding sampling fault instruction; the error values comprise a maximum pile-up error value of a sampling instruction output and a corresponding sampling fault instruction output, a maximum value of a relative error of the sampling instruction output and the corresponding sampling fault instruction output, and a matrix error of the sampling instruction output and the corresponding sampling fault instruction output;

and obtaining the output quality predicted value corresponding to each group of the voltage-approximation degree data sets according to the error value.

Optionally, the injecting a fault to each sampling instruction by using a fault injection method to obtain a sampling fault instruction specifically includes:

constructing a fault injection platform; the fault injection platform is integrated with debugging control software, fault injection software, a hardware simulator and a simulation back plate;

and adopting the fault injection platform to inject faults into each sampling instruction to obtain sampling fault instructions.

Optionally, the constructing an objective optimization function specifically includes:

constructing a function with the performance parameters as targets and the energy consumption parameters and the output quality parameters as constraint conditions; the function is an objective optimization function.

Optionally, the step of solving the target optimization function by using a simulated annealing algorithm with the performance prediction value, the energy consumption prediction value and the output quality prediction value corresponding to all the voltage-approximation degree data sets as inputs to obtain an optimal voltage-approximation degree data set specifically includes:

taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and obtaining the voltage-approximation degree data sets meeting optimization conditions by adopting a simulated annealing algorithm; the optimization condition is that the performance predicted value is maximum, the energy consumption predicted value is smaller than a preset energy consumption preset value, the output quality predicted value is smaller than a preset output quality preset value, or the performance predicted value is reduced by a preset frequency along with the reduction of the annealing temperature;

and determining the voltage-approximation degree data set meeting the optimization condition as an optimal voltage-approximation degree data set.

The invention also provides a processor core optimization system based on near threshold calculation, which comprises the following steps:

the data acquisition module is used for acquiring a plurality of groups of voltage-approximation degree data sets; each of the voltage-approximation degree data sets includes a voltage and a corresponding approximation degree value;

the predicted value acquisition module is used for taking a plurality of groups of voltage-approximation degree data sets as the input of the processor core to obtain a performance predicted value, an energy consumption predicted value and an output quality predicted value corresponding to each group of voltage-approximation degree data sets;

the objective function construction module is used for constructing an objective optimization function;

the solving module is used for taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and solving the target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set;

an optimal group determination module, configured to determine a voltage in the optimal voltage-approximation degree data group as an optimal voltage in a near-threshold calculation state, where an approximation degree value in the optimal voltage-approximation degree data group is used as an optimal approximation degree; the processor core operates at the optimal voltage and the optimal degree of approximation.

Optionally, the predicted value obtaining module specifically includes:

a performance prediction unit for taking the voltage-approximation degree data sets as input of the performance predictor, and obtaining performance prediction values corresponding to the voltage-approximation degree data sets by adopting an approximation calculation method

IPS _i ＝Av _i +ΔIPS _i ，

the energy consumption prediction unit is used for taking the multiple voltage-approximation degree data sets as the input of the energy consumption predictor and obtaining the energy consumption prediction value corresponding to each voltage-approximation degree data set by adopting an approximation calculation method

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

Wherein beta is _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i Representing the approximation degree value in the ith group of voltage-approximation degree data set, wherein D is the influence degree of the approximation calculation method on the energy consumption;

and the output quality prediction unit is used for taking the multiple voltage-approximation degree data sets as the input of the output quality predictor and obtaining the output quality prediction value corresponding to each voltage-approximation degree data set by adopting an approximation calculation method and a fault injection method.

Optionally, the output quality prediction unit specifically includes:

the classification subunit is used for taking the multiple groups of voltage-approximation degree data groups as the input of an output quality predictor, and classifying the instructions in the application program executed on the processor core to obtain multiple instruction classes; each instruction category comprises a plurality of instructions with similar propagation paths;

the sampling subunit is used for sampling each instruction type by adopting an approximate calculation method to obtain a plurality of sampling instructions;

the fault injection subunit is used for injecting faults into each sampling instruction by adopting a fault injection method to obtain sampling fault instructions;

the error calculation subunit is used for calculating an error value according to each sampling instruction and the corresponding sampling fault instruction; the error values comprise a maximum pile-up error value of a sampling instruction output and a corresponding sampling fault instruction output, a maximum value of a relative error of the sampling instruction output and the corresponding sampling fault instruction output, and a matrix error of the sampling instruction output and the corresponding sampling fault instruction output;

and the output quality prediction subunit is used for obtaining the output quality prediction value corresponding to each group of the voltage-approximation degree data sets according to the error value.

Optionally, the objective function constructing module specifically includes:

the performance target construction unit is used for constructing a function with the performance parameters as targets and the energy consumption parameters and the output quality parameters as constraint conditions; the function is an objective optimization function;

the solving module specifically includes:

the optimization unit is used for taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input and obtaining the voltage-approximation degree data sets meeting the optimization conditions by adopting a simulated annealing algorithm; the optimization condition is that the performance predicted value is maximum, the energy consumption predicted value is smaller than a preset energy consumption preset value, the output quality predicted value is smaller than a preset output quality preset value, or the performance predicted value is reduced by a preset frequency along with the reduction of the annealing temperature;

and the determining unit is used for determining the voltage-approximation degree data set meeting the optimization condition as an optimal voltage-approximation degree data set.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a processor core optimization method and a system based on near threshold calculation, wherein the method comprises the following steps: acquiring a plurality of voltage-approximation degree data sets comprising voltages and corresponding approximation degree values; taking a plurality of groups of voltage-approximation degree data sets as the input of the processor core to obtain a performance predicted value, an energy consumption predicted value and an output quality predicted value corresponding to each group of voltage-approximation degree data sets; taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and solving a target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set; and determining the voltage in the optimal voltage-approximation degree data set as the optimal voltage in a near threshold calculation state, wherein the approximation degree value in the optimal voltage-approximation degree data set is used as the optimal approximation degree. The invention considers the voltage and the approximation degree value at the same time, comprehensively considers the energy consumption, the performance and the output quality, adopts the simulated annealing algorithm to select the optimal voltage-approximation degree data set, realizes multidimensional optimization, enables the processor core to operate under the optimal voltage and the optimal approximation degree in a near-threshold calculation state, and has high reliability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a flowchart of a processor core optimization method based on near threshold calculation according to embodiment 1 of the present invention;

FIG. 2 is a block diagram of a processor core optimization method based on near threshold computation according to embodiment 2 of the present invention;

FIG. 3 is a block diagram of an output quality predictor according to embodiment 2 of the present invention;

fig. 4 is a system framework diagram of a fault injection platform according to embodiment 2 of the present invention;

fig. 5 is a schematic structural diagram of a processor core optimization system based on near-threshold calculation according to embodiment 3 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.

Fig. 1 is a flowchart of a processor core optimization method based on near threshold calculation according to an embodiment of the present invention.

Referring to fig. 1, the method for optimizing a processor core based on a near-threshold calculation of the embodiment includes:

step S1: multiple sets of voltage-approximation data sets are acquired.

Each of the voltage-approximation degree data sets includes a voltage and a corresponding approximation degree value. Due to the limitation of area overhead, the scaling interval of the voltage level is limited, and the change of the voltage level is not continuous, so that the selectable value of the voltage is determined; in systems employing approximation calculations, different degrees of approximation may be selected, for example, neural network approximation techniques change the degree of approximation by changing different network topologies, and mantissa truncation approximation techniques change different degrees of approximation by changing the number of bits truncated. The selectable range of the degree of approximation is limited, for example, the number of truncatable bits in the network topology and mantissa truncation that can be selected in the neural network approximation technique is determined. Therefore, the voltage and approximation degree value combinations are also limited, i.e., the number of voltage-approximation degree data sets is limited.

Step S2: and taking the multiple groups of voltage-approximation degree data sets as the input of a processor core to obtain a performance predicted value, an energy consumption predicted value and an output quality predicted value corresponding to each group of voltage-approximation degree data sets.

The step S2 specifically includes:

1) Taking the multiple voltage-approximation degree data groups as the input of a performance predictor, and obtaining the performance predicted value corresponding to each voltage-approximation degree data group by adopting an approximation calculation method

IPS _i ＝Av _i +ΔIPS _i ，

Wherein v is _i Representing voltages in the ith set of voltage-approximation degree data sets, A being a constant, A depending on the configuration of the processor core and the application executing on the processor core, Δ IPS _i Indicating the degree of influence of the approximation calculation method on the performance.

2) And taking the multiple groups of voltage-approximation degree data groups as the input of an energy consumption predictor, and obtaining energy consumption predicted values corresponding to each group of voltage-approximation degree data groups by adopting an approximation calculation method

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

Wherein, beta _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i And D is the influence degree of the approximation degree on the energy consumption depending on the approximation calculation method.

3) And taking the multiple groups of voltage-approximation degree data groups as the input of an output quality predictor, and obtaining the output quality predicted value corresponding to each group of voltage-approximation degree data groups by adopting an approximation calculation method and a fault injection method. The method specifically comprises the following steps:

31 The multiple voltage-approximation degree data groups are used as input of an output quality predictor, and instructions in an application program executed on the processor core are classified to obtain multiple instruction classes; each of the instruction classes includes a plurality of instructions having similar propagation paths.

32 Each of the instruction classes is sampled using an approximate calculation method to obtain a plurality of sampled instructions.

33 Adopting a fault injection method to inject a fault into each sampling instruction to obtain a sampling fault instruction; specifically, a fault injection platform is constructed, and debugging control software, fault injection software, a hardware simulator and a simulation back plate are integrated on the fault injection platform; and then, adopting the fault injection platform to inject faults into each sampling instruction to obtain sampling fault instructions.

34 Calculating an error value based on each of the sampling commands and the corresponding sampling fault command; the error values include a maximum pile-up error value of a sampling command output and a corresponding sampling fault command output, a maximum value of a relative error of a sampling command output and a corresponding sampling fault command output, and a matrix error of a sampling command output and a corresponding sampling fault command output.

35 According to the error value, obtaining the output quality predicted value corresponding to each group of the voltage-approximation degree data groups.

And step S3: and constructing an objective optimization function.

The target optimization function can be a function with a performance parameter as a target and an energy consumption parameter and an output quality parameter as constraint conditions; performance parameters can be used as targets, and energy consumption parameters can be used as functions of constraint conditions; the energy consumption parameter can be used as a target, and the performance parameter and the output quality parameter can be used as functions of constraint conditions; it is possible to target the energy consumption parameter as a function of the constraints and the performance parameter. In the embodiment, the performance parameter is used as a target, and the energy consumption parameter and the output quality parameter are used as a function of constraint conditions as a target optimization function.

And step S4: and taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and solving the target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set.

When the performance parameters are used as targets, and the functions of the energy consumption parameters and the output quality parameters as constraint conditions are used as target optimization functions, the simulated annealing algorithm is adopted to solve, and the specific process of obtaining the optimal voltage-approximation degree data set is as follows:

1) Taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and obtaining the voltage-approximation degree data sets meeting the optimization conditions by adopting a simulated annealing algorithm; the optimization conditions are that the performance predicted value is maximum, the energy consumption predicted value is smaller than a preset energy consumption preset value, the output quality predicted value is smaller than a preset output quality preset value, or the performance predicted value is reduced at a preset frequency along with the reduction of the annealing temperature.

2) The voltage-approximation degree data set satisfying the optimization condition is determined as an optimal voltage-approximation degree data set.

Step S5: determining the voltage in the optimal voltage-approximation degree data set as the optimal voltage in a near threshold calculation state, wherein an approximation degree value in the optimal voltage-approximation degree data set is used as the optimal approximation degree; the processor core operates at the optimal voltage and the optimal approximation degree.

The processor core optimization method based on the near threshold calculation can effectively find the optimal voltage-approximation degree data set so as to obtain the optimal three-dimensional optimization effect of performance, energy and output quality, and the method is high in reliability. The method is characterized in that a prediction model consists of three predictors for predicting performance, output quality and energy, wherein the output quality predictor simulates static faults in a near-threshold computing system through a fault injection method of software and hardware collaborative design to predict the output quality, and an optimal voltage level and approximate degree configuration of the system in the near-threshold system of the processor core are found through a simulated annealing optimization algorithm, and the configuration can maximize the performance under a given energy/output quality requirement or minimize the energy under the given performance/output quality requirement, so that the reliability is further improved.

Example 2:

in order to obtain the optimal voltage and approximation degree value so that the processor core has the optimal three-dimensional optimization effect of energy consumption, performance and output quality in the operation process, in the embodiment, first, a multidimensional optimization model is established, where the model includes three predictors: the performance predictor, the energy consumption predictor and the output quality predictor are used for obtaining the optimal combination of voltage level and approximation degree through a simulated annealing algorithm so as to obtain the optimal multi-dimensional optimization effect. As shown in FIG. 2, the inputs to the multidimensional optimization model are the output quality threshold, the energy consumption budget, the system configuration, and the approximated code region; three predictors in the model can predict the output quality, energy consumption and performance (the performance is expressed by IPS (in-plane switching) which refers to the number of instructions operated in unit time) of a near-threshold computing system under a certain given voltage level and approximation degree condition; the three predictors finally obtain the combination of the optimal voltage level and the approximation degree through a simulated annealing algorithm to obtain the multidimensional optimization effect.

1. Performance predictor

The performance predictor uses IPS to describe performance, and is specifically as follows:

IPS _i ＝Av _i +ΔIPS _i ，

in which IPS _i Indicates a predicted value of performance, v _i Representing voltages in the ith set of voltage-approximation degree data sets, A being a constant, A depending on the configuration of the processor core and the application executing on the processor core, Δ IPS _i Indicating the degree of performance impact of the approximation calculation method, the larger a, the more sensitive the IPS is to voltage variations.

2. Energy consumption predictor

The energy consumption predictor is specifically as follows:

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

wherein Energy _i Representing a predicted value of energy consumption, beta _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i And D is the influence degree of the approximation calculation method on the energy consumption.

3. Output quality predictor

The design starting points of the output quality predictor are as follows: if the data controlled by the instruction have similar propagation paths, the data will have similar influence on the output quality of the program if an error occurs. Based on this starting point, the instructions are divided into a series of groups, with data being distributed with "similar" attributes sorted into the same group.

As shown in fig. 3, the first type of command has no effect on the output quality regardless of the voltage level and proximity used. The four instruction groups into which the following are divided: (1) the NOP instruction is a null statement, does not carry out any operation and has no influence on an output result; (2) an error in a Performance-enhancing instructions of Performance-enhancing instructions may result in a prefetch being invalid, but program semantics may not change; (3) the predicted-false instructions are wrongly Predicated, the result of the instructions is discarded, and the program operation is not influenced; (4) the dynamic dead instructions can not be used by the system, and the output result can not be influenced.

The second class of instructions will affect the output quality to different degrees by adopting approximate calculation methods of different degrees. A typical packet has instructions to hold relevant data about the filter and the pixel; instructions in the convolution that involve convolution calculations are approvable; instructions loaded from an image source and stored to an image destination may also be approximated. We then select representative sampling instructions from each group using an approximate calculation method based on the grouping.

Due to the limitation of area overhead, the scaling interval of the voltage level is limited, while the change of the voltage level is not continuous, so that the value that the voltage level can select from is determined. In systems employing approximation calculation methods, the choice of the degree of approximation is also limited. Therefore, the combination of voltage and approximation degree is also limited. Since the data set is finite, the effect of each sampled instruction on the output quality at each possible voltage and proximity can be evaluated by fault injection and propagation analysis. To simulate a static fault at a voltage, a fault injection platform is used to inject the fault for each representative command selected from each group. The impact on the output quality can be obtained by error propagation analysis after fault injection.

To quantify the impact on output quality, three output quality indicators are proposed for different types of applications: (1) max-abs-diff: the index gives the maximum absolute difference between the complete correct output and the fault output; (2) max-rel-err: calculating a maximum value of a relative error between the correct output and the fault output; (3) rel-l2-norm: for directly comparing the errors of the two matrices. As shown in fig. 3, the influence of each sampling instruction on the output result after fault injection is counted, and the influence is quantized into the three indexes. The specific quantization process is to abstract the three Quality indexes given above into an output Quality Bucket (Quality Bucket) to quantize the output Quality, and then observe which output Quality Bucket the influence of the instruction on the output Quality falls into after the instruction is input by a fault. As shown in table 1, the "output quality buckets" for each instruction set at various voltage levels and degrees of approximation are obtained by the method of fault injection. Given a program, the distribution of all instructions in table 1 can be known through fault injection and propagation analysis of the instructions, and the corresponding output quality can be predicted according to the distribution.

TABLE 1

/>

4. Fault injection platform

The fault injection platform in this embodiment is designed based on a backplane technology, and fig. 4 is a system framework diagram of the fault injection platform in embodiment 2 of the present invention. Referring to fig. 4, the fault injection platform integrates debug control software, fault injection software, and a hardware emulator.

The debugging control software is responsible for compiling and configuring a software program running on the target processor, can provide input and output through a graphical interface or a command line, is communicated with bottom-layer hardware through a network or a serial port, sends a debugging control command to the target processor in the hardware simulator through the simulation backboard, and simultaneously receives a system state returned by the hardware processor for debugging through the simulation backboard.

The fault injection software receives parameters such as fault injection time and position through a user interface, receives a hardware signal list sent by hardware through a simulation back plate, completes layered identification and recording of all signals through cyclic traversal of deep search, establishes a layered resource pool for directly positioning injection signals during fault injection, then generates a fault library aiming at the hardware signals, and sends the generated fault and simulation control information to a target processor through the simulation back plate and receives the state of the target processor after injection through the simulation back plate.

The fault-tolerant processor prototype is executed in a hardware simulator, the hardware simulator selects LEON2 as a CPU kernel, and the LEON2 has the following technical characteristics: the SPARCV8 structure, the internal AMBA bus structure, the fault tolerant design and the VHDL programming style are adopted. The LENO2 processor unit mainly includes a five-stage pipelined integer unit (5-stage IU), a Floating Point Unit (FPU) for floating point operations, a coprocessor unit (CP). The integer Unit 5-stage IU has a separate data cache (Dcache) and instruction cache (Icache), in addition to a Memory Management Unit (MMU). The on-chip peripherals of LENO2 include a bus (PCI) with an external component interconnect standard, an ethernet interface (net), a Dynamic Random Access Memory (DRAM), and a Debug Support Unit (DSU). The DSU may set the processor to debug mode, through which all registers and caches of the processor may be read and written. The DSU also includes a trace cache that stores executed instructions and data transferred on the AHB. Software commands are received and execution status is sent by an internal debug unit DSU with trace cache. The simulation back plate virtualizes hardware equipment such as a network card and a serial port, sends a control command and fault injection information to a hardware DSU according to the requirement of software and hardware information interaction, sends the information returned by the DSU to different destinations according to contents, sends debugging information observed by a debugger to debugging software, sends the information required by fault injection such as a signal state to the fault injection software, and realizes transmission and scheduling of communication data such as the control command and a returned result.

The simulation backboard realizes mutual association and mapping between the hardware simulator and the software system and seamless information interaction between the data control modules by the mixed programming of VHDL and C language and the external language interface of the hardware description language. The external language interface may be excited with the C language design and perform simulation verification tasks in the hardware simulator. The hardware interface part extracts the concerned signals in the hardware module by means of signal monitoring and signal instant state reading functions provided by the hardware description language external interface, fault injection is realized by forced assignment of signal logic values, the signal logic values immediately change after injection, other signals having logic relation with the signals in the circuit cannot be influenced, and the influence effect on the circuit when a single event occurs is simulated.

In the embodiment, the fault injection platform is completely separated from a physical prototype through the collaborative simulation based on the backboard technology, so that a designer can find errors as early as possible in the design and modify the errors in time, thereby reducing the development cost. The software part on the simulation backboard is instantiated corresponding to the sensitive signal in the hardware description language module, when the logic circuit triggers the sensitive signal of the module, the simulator calls the program in the corresponding dynamic link library, and the signal defined in the hardware description language is mapped to the high-level language environment seamlessly, at this time, any processing of the signal logic value can be realized based on the algorithm of the software, the high-level language is easier than the hardware language and the script language in flow control and function calling, and is not limited by the hardware language and the simulation command of the simulator any more, so that more complex fault models and algorithms can be supported, and the high-level language has good transportability and is convenient for more multifunctional expansion. And the software program can not be integrated when being downloaded into the chip, so that no additional logic is added, the invasion and influence on a hardware module are avoided, the integrity of the target model is protected, and the test result is more authentic.

5. Optimization algorithm

The problems that need to be optimized are as follows: having a set of processor cores PE (PE) _1......N ) Can operate at M (v) _1......M ) A different voltage level and K (m) _1......K ) Under different approximation degrees, the optimal combination is selected from the M multiplied by K voltage and approximation degree combinations to obtain the three-dimensional optimal compromise optimization of performance, power consumption and output quality. Table 2 gives four different optimization strategies. In case 1, the goal is to maximize performance, P-E + R denotes that performance is the optimization goal, while energy and output quality are constraints, P-E denotes that performance is the optimization goal, while energy is a constraint. In case 2, the optimization goal is to minimize the energy, E-P + R denotes that the energy is an objective function and the performance and output quality are constraints, E-P denotes that the energy is an objective function and the performance is a constraint, regardless of the output quality. In this embodiment, an index EEPI is used to evaluate the optimization effect of compromise among energy, performance, and output quality, where the EEPI is specifically as follows:

EEPI＝Error·Energy/IPS，

wherein Error is the output quality loss of the system, energy is the Energy consumption of the system, IPS represents the performance of the system, and the lower the EEPI value is, the better the comprehensive optimization effect of the performance, the Energy consumption and the output quality is.

TABLE 2

P-E + R is selected as an optimization scheme below, and the optimization problem is solved through a Simulated Annealing (SA) algorithm. The simulated annealing algorithm is a random optimization algorithm based on a Monte-Carlo iterative solution strategy, and solves the optimization problem by simulating the annealing process of solid substances in physics. The SA may find a globally optimal solution to the objective function in case of a temperature drop. The maximum advantage of SA is that locally optimal solutions can be skipped, eventually tending towards globally optimal solutions. A brief description of the optimization using simulated annealing follows: firstly, initializing a voltage and approximation degree combination of temperature and a processor core, obtaining initial performance through a performance predictor, then continuously changing the voltage and approximation degree combination, if the energy consumption and the output quality meet requirements and the performance is increased, updating the combination and the performance, if the performance is decreased with a certain probability along with the decrease of annealing temperature, updating the combination and the performance, and finally finding the optimal voltage and approximation degree combination through continuous iteration. The code for obtaining the optimal solution by the simulated annealing algorithm is as follows:

the optimal voltage level and approximation combination can be obtained as an optimal solution by the above algorithm. Specifically, a series of C = { C is used ₁ ,c ₂ ,c ₃ ,...,c _n Configuring initialization PEs, wherein c corresponds to the b-th PE _b Is a combination of voltage level and proximity. The algorithm first MOVEs ANNEAL _ MOVE () by annealing to maximize IPS while exploring the entire solution space; the algorithm then checks the IPS _new Whether it is greater than IPS and Energy _new Whether or not it is lower than Energy _budget (energy consumption budget) and OQLOSS _new Whether or not it is lower than OQLOSS _threshold (output quality threshold) if none of the above conditions is met, IPS _new Decreases with a certain probability as the annealing temperature decreases; if the combination meeting the conditions exists, determining the combination meeting the conditions as the combination of the optimal voltage level and the approximation degree; the processor core operates under the combination of the optimal voltage level and the approximate degree, so that the performance can be improved to the maximum extent, and the requirements of users on the output quality and the performance can be met. The specific example given by the specific algorithm is that the performance is taken as the target, the energy and the output quality are taken as the constraint conditions (P-E + R), and similarly, the most optimized scheme can be adopted by combining three other optimization strategies (E-P + R, E-P or P-E) with the simulated annealing algorithm.

In the processor core optimization method based on the near threshold calculation in this embodiment, an NTC system based on approximate calculation is used to design a multidimensional optimization model, where the model includes three predictors: the performance predictor, the energy consumption predictor and the output quality predictor are used for obtaining the optimal voltage level and approximation degree combination through a simulated annealing optimization algorithm. The method realizes automatic voltage level and approximation degree selection, ensures that the processor core operates under the condition of optimal energy consumption, performance and output quality, realizes multidimensional optimization, and improves the reliability of the optimization method of the processor core.

The designed output quality predictor divides errors in the instructions into a group with approximate propagation paths, then selects representative sampling instructions from the group, analyzes the output quality of the instructions at a given voltage level and approximate degree by using a fault injection method, wherein a fault injection platform realizes mixed language programming based on VHDL and C based on simulation backboard design, and the design of the output quality predictor further improves the reliability of the method.

In addition, the influence of the optimization method on the applications in the NTC system is evaluated by analyzing the energy efficiency characteristics and the fault tolerance characteristics of different applications under certain voltage levels and approximation degrees, the characteristics of the application programs can be sensed by the optimization model, the energy and IPS distribution proportion of each program is adjusted, the optimal multi-dimensional optimization effect balance of the whole system is obtained, and the effectiveness of the optimization method is verified.

Example 3:

the invention also provides a processor core optimization system based on near threshold calculation, and fig. 5 is a schematic structural diagram of a processor core optimization system based on near threshold calculation in embodiment 3 of the invention.

Referring to fig. 5, the processor core optimization system based on the near threshold calculation of the embodiment includes:

a data acquisition module 501, configured to acquire multiple voltage-approximation degree data sets; each of the voltage-approximation degree data sets includes a voltage and a corresponding approximation degree value.

A predicted value obtaining module 502, configured to use multiple voltage-approximation degree data sets as inputs of the processor core, to obtain a performance predicted value, an energy consumption predicted value, and an output quality predicted value corresponding to each voltage-approximation degree data set.

And an objective function constructing module 503, configured to construct an objective optimization function.

And the solving module 504 is configured to take the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as inputs, and solve the target optimization function by using a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set.

An optimal group determining module 505, configured to determine a voltage in the optimal voltage-approximation degree data group as an optimal voltage in a near-threshold calculation state, where an approximation degree value in the optimal voltage-approximation degree data group is used as an optimal approximation degree; the processor core operates at the optimal voltage and the optimal degree of approximation.

As an optional implementation manner, the predicted value obtaining module 502 specifically includes:

a performance prediction unit for taking the multiple voltage-approximation degree data sets as input of the performance predictor and obtaining performance prediction values corresponding to the voltage-approximation degree data sets by adopting an approximation calculation method

IPS _i ＝Av _i +ΔIPS _i ，

Wherein v is _i Representing voltages in the ith set of voltage-approximation degree data sets, A being a constant, A depending on the configuration of the processor core and the application program executing on the processor core, Δ IPS _i Indicating the degree of influence of the approximation calculation method on the performance.

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

Wherein, beta _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i And D is the influence degree of the approximation calculation method on the energy consumption.

The output quality prediction unit specifically includes:

the classification subunit is used for taking the multiple groups of voltage-approximation degree data groups as the input of an output quality predictor, and classifying the instructions in the application program executed on the processor core to obtain multiple instruction classes; each of the instruction classes includes a plurality of instructions having similar propagation paths.

And the sampling subunit is used for sampling each instruction type by adopting an approximate calculation method to obtain a plurality of sampling instructions.

And the fault injection subunit is used for injecting faults into each sampling instruction by adopting a fault injection method to obtain sampling fault instructions.

The error calculation subunit is used for calculating an error value according to each sampling instruction and the corresponding sampling fault instruction; the error values include a maximum pile-up error value of a sampling command output and a corresponding sampling fault command output, a maximum value of a relative error of the sampling command output and the corresponding sampling fault command output, and a matrix error of the sampling command output and the corresponding sampling fault command output.

As an optional implementation manner, the objective function constructing module 503 specifically includes:

the performance target construction unit is used for constructing a function taking the performance parameters as targets and taking the energy consumption parameters and the output quality parameters as constraint conditions; the function is an objective optimization function.

The solving module 504 specifically includes:

The processor core optimization system based on the near-threshold calculation can automatically select the voltage level and the approximation degree, ensure that the processor core operates under the condition of optimal energy consumption, performance and output quality, realize multidimensional optimization and improve the reliability of the processor core optimization method.

For the system disclosed in embodiment 3, since it corresponds to the method disclosed in

embodiment

1 or 2, the description is relatively simple, and for the relevant points, refer to the description of the method section.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A processor core optimization method based on near threshold calculation is characterized by comprising the following steps:

IPS _i ＝Av _i +ΔIPS _i ，

Wherein v is _i Representing voltages in the ith set of voltage-approximation degree data sets, A being a constant, A depending on the configuration of the processor core and the application program executing on the processor core, Δ IPS _i Representing the influence degree of the approximate calculation method on the performance;

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

Wherein, beta _i A constant, beta, between 0 and 1 corresponding to the ith voltage-approximation degree data set _i C represents a constant depending on the level of voltage demand by the user, C represents a constant depending on the configuration of the processor core, m _i Representing the approximation degree value in the ith voltage-approximation degree data set, wherein D is the influence degree of the approximation calculation method on the energy consumption;

taking the multiple voltage-approximation degree data groups as the input of an output quality predictor, and obtaining the output quality predicted value corresponding to each voltage-approximation degree data group by adopting an approximation calculation method and a fault injection method;

constructing an objective optimization function;

taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and solving the target optimization function by adopting a simulated annealing algorithm to obtain an optimal voltage-approximation degree data set;

determining the voltage in the optimal voltage-approximation degree data set as the optimal voltage in a near threshold calculation state, wherein an approximation degree value in the optimal voltage-approximation degree data set is used as the optimal approximation degree; the processor core operates at the optimal voltage and the optimal approximation degree.

2. The method as claimed in claim 1, wherein the step of obtaining the predicted output quality value corresponding to each voltage-approximation degree data set by using the multiple voltage-approximation degree data sets as input of an output quality predictor and using an approximation calculation method and a fault injection method specifically comprises:

injecting faults into each sampling instruction by adopting a fault injection method to obtain sampling fault instructions;

3. The method as claimed in claim 2, wherein the step of obtaining the sampling fault instruction by injecting the fault into each sampling instruction by using the fault injection method comprises:

4. The method for optimizing processor cores based on near-threshold computation of claim 1, wherein the constructing an objective optimization function specifically includes:

constructing a function taking the performance parameters as targets and taking the energy consumption parameters and the output quality parameters as constraint conditions; the function is an objective optimization function.

5. The method as claimed in claim 4, wherein the step of solving the objective optimization function by using a simulated annealing algorithm with the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as inputs to obtain an optimal voltage-approximation degree data set specifically comprises:

taking the performance predicted value, the energy consumption predicted value and the output quality predicted value corresponding to all the voltage-approximation degree data sets as input, and obtaining the voltage-approximation degree data sets meeting the optimization conditions by adopting a simulated annealing algorithm; the optimization condition is that the performance predicted value is maximum, the energy consumption predicted value is smaller than a preset energy consumption preset value, the output quality predicted value is smaller than a preset output quality preset value, or the performance predicted value is reduced at a preset frequency along with the reduction of the annealing temperature;

the voltage-approximation degree data set satisfying the optimization condition is determined as an optimal voltage-approximation degree data set.

6. A processor core optimization system based on near-threshold computations, comprising:

the target function construction module is used for constructing a target optimization function;

an optimal group determination module, configured to determine a voltage in the optimal voltage-approximation degree data group as an optimal voltage in a near-threshold calculation state, where an approximation degree value in the optimal voltage-approximation degree data group is used as an optimal approximation degree; the processor core operates at the optimal voltage and the optimal approximation degree;

the predicted value obtaining module specifically includes:

IPS _i ＝Av _i +ΔIPS _i ，

Energy _i ＝(β _i v _i ) ² C+(β _i v _i ) ² m _i D，

7. The system according to claim 6, wherein the output quality prediction unit specifically includes:

8. The system of claim 6, wherein the objective function construction module specifically comprises:

the solving module specifically comprises: