CN110197026A

CN110197026A - A kind of processor core optimization method and system based on nearly threshold calculations

Info

Publication number: CN110197026A
Application number: CN201910449741.0A
Authority: CN
Inventors: 王晶; 梁伟伟; 张伟功
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2019-09-03
Anticipated expiration: 2039-05-28
Also published as: CN110197026B

Abstract

The invention discloses a kind of processor core optimization methods and system based on nearly threshold calculations.This method comprises: obtaining multiple groups voltage-degree of approximation data group；Using multiple groups voltage-degree of approximation data group as the input of processor core, the corresponding performance prediction value of every group of voltage-degree of approximation data group, energy consumption predicted value and output quality predictions are obtained；Using the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions as input, objective optimization function is solved using simulated annealing, obtains optimal voltage-degree of approximation data group；Voltage in optimal voltage-degree of approximation data group is determined as the optimal voltage under nearly threshold calculations state, the degree of approximation value in optimal voltage-degree of approximation data group is as best fit approximation degree.The present invention can automate selection voltage level and degree of approximation, to obtain energy consumption, performance and the output optimal three-dimensional optimized effect of quality comprehensive, high reliablity.

Description

A kind of processor core optimization method and system based on nearly threshold calculations

Technical field

The present invention relates to processor core system optimization technology fields, more particularly to a kind of processing based on nearly threshold calculations Device core optimization method and system.

Background technique

The appearance of " power consumption wall " has become a challenge for hindering development of computer.As manufacture craft is continuously improved, Have in order to keep power consumption within tolerance interval, in chip and be not utilized largely, is i.e. " dark silicon " described in everybody, The problem of this can bring using wall again.It data show under 10nm manufacture craft, 52% chip area is in " dark silicon " Under state.In order to solve the problems, such as that researchers propose nearly threshold calculations technology (NTC), under NTC state " using wall " Transistor all operates in the nearly threshold range of voltage.Under nearly threshold calculations state, good performance and power consumption can be obtained Compromise.For example, nearly threshold calculations calculate (STC) under conditions of obtaining the saving of identical energy consumption compared to superthreshold, performance Loss but very little.For example, obtaining 50% energy saving under approximate calculation state, 20% performance loss will cause, but It is under superthreshold design conditions, to obtain identical energy saving, superthreshold calculating can bring higher energy loss.

But nearly threshold calculations are faced with the challenge of new integrity problem, the progress especially as manufacture craft is this Challenge seems particularly evident.Process deviation influence the fundamental characteristics of device in chip, this has been that industry is inevitable Problem.This process deviation seems especially prominent in the case where nearly threshold calculations, this is primarily due to being averaged for each unit Error rate can rise, while the deviation of entire chip also will increase.In order to solve the problems, such as reliability, researchers are proposed very More fault-toleranr techniques, for example, error correcting, reconstruct and hardware redundancy etc..The purpose of these technologies is to completely eliminate mistake, this Sample can bring inevitably additional fault-tolerant expense.

Present many applications, for example, pattern-recognition, data mining and speech recognition etc., themselves has good appearance Wrong characteristic.It is specifically inaccurately calculated in these applications and data is all acceptable.For example, in a search engine Even if search result is not to coincide but can be received with search content completely；Since the sensing capability of people in itself is limited Can skip picture in some videos, people only lie in these applications as a result, they will not be concerned about intermediate mistake Journey is correct execution very.Therefore, occur to be not necessarily to correct it when mistake in these applications, it thus can be with It reduces due to unnecessary additional performance and time overhead caused by fault-tolerant.These applications are insensitive to mistake, so they Be to nearly threshold calculations it is friendly, they can be mitigated the reliability of nearly threshold calculations the problem of.For the appearance of these applications Wrong characteristic, researchers propose the approximation method of hardware layer and software layer.These approximations understand artificial introduction mistakes but they are right The influence of final output quality is limited, and the loss of these output quality is completely within the tolerance interval of user.

Currently, generalling use in processor core optimization method: 1) voltage regulation techniques consider how to adjust voltage to obtain Obtain the raising of energy efficiency.This method only considers voltage, has not only accounted for the selection of approximation technique, but also does not account for energy consumption, property The optimization of quality three-dimensional can and be exported, this results in reliability lower.For example, dynamic voltage frequency adjustment (DVFS), dynamic skill Art is then the different needs of the application program that is run according to chip to computing capability, the running frequency and electricity of dynamic regulation chip It presses (for same chip, frequency is higher, and required voltage is also higher), thus reach energy-efficient purpose, it is this popular Technology is exactly only to consider the adjusting of voltage, and cannot obtain the effect of optimization of multidimensional.2) consider to apply under different degrees of approximation The variation of output accuracy, but the combination for the selection suitable degree of approximation and voltage that can be automated there is no one kind.Therefore, existing Optimization method be usually accomplished that one-dimensional optimizes, and be all that nearly threshold calculations technology and approximate calculation technology is isolated out completely To realize optimization, lacks one kind and can automate adjusting and obtain energy consumption, performance and defeated to select voltage level and degree of approximation The comprehensive optimal three-dimensional optimized effect of mass.

Summary of the invention

Based on this, it is necessary to a kind of processor core optimization method and system based on nearly threshold calculations is provided, to realize certainly Dynamicization selects voltage level and degree of approximation, guarantees that processor core is transported in the case where optimal energy consumption, performance and output quality Row realizes multi-dimensional optimization, improves the reliability of processor core optimization method.

To achieve the above object, the present invention provides following schemes:

A kind of processor core optimization method based on nearly threshold calculations, comprising:

Obtain multiple groups voltage-degree of approximation data group；The each voltage-degree of approximation data group includes a voltage With corresponding degree of approximation value；

Using voltage described in multiple groups-degree of approximation data group as the input of processor core, obtain every group described in voltage-approximation The corresponding performance prediction value of level data group, energy consumption predicted value and output quality predictions；

Construct objective optimization function；

By the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions As input, the objective optimization function is solved using simulated annealing, obtains optimal voltage-degree of approximation data Group；

Voltage in the optimal voltage-degree of approximation data group is determined as the optimal electricity under nearly threshold calculations state It presses, the degree of approximation value in the optimal voltage-degree of approximation data group is as best fit approximation degree；The processor core operation Under the optimal voltage and the best fit approximation degree.

Optionally, described using voltage described in multiple groups-degree of approximation data group as the input of processor core, obtain every group of institute The corresponding performance prediction value of voltage-degree of approximation data group, energy consumption predicted value and output quality predictions are stated, are specifically included:

Using voltage described in the multiple groups-degree of approximation data group as the input of Performance Predicter, using approximate calculation side Method obtain every group described in the corresponding performance prediction value of voltage-degree of approximation data group

IPS_i=Av_i+ΔIPS_i,

Wherein, v_iIndicate that the voltage in i-th group of voltage-degree of approximation data group, A are constant, A depends on processor core The application program executed in configuration and processor core, Δ IPS_iIndicate approximate calculation method to the influence degree of performance；

Using voltage described in the multiple groups-degree of approximation data group as the input of energy consumption fallout predictor, using approximate calculation side Method obtain every group described in the corresponding energy consumption predicted value of voltage-degree of approximation data group

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Wherein, β_iIndicate i-th group of voltage-degree of approximation data group it is one 0 to 1 corresponding between constant, β_iDepending on use Desirability of the family to voltage, C expression constant, the configuration that C depends on processor core indicate constant, m_iIndicate that i-th group of voltage-is close Like the degree of approximation value in level data group, D is the influence degree depending on approximate calculation method to energy consumption；

Using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, counted using approximation Calculation method and fault injection methods obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group.

Optionally, it is described using voltage described in the multiple groups-degree of approximation data group as output quality predictor input, Using approximate calculation method and fault injection methods obtain every group described in the corresponding output prediction of quality of voltage-degree of approximation data group Value, specifically includes:

Using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, to the processing The instruction in application program executed on device core is classified, and multiple instruction classification is obtained；Each described instruction classification includes more The similar instruction of a propagation path；

Each described instruction classification is sampled using approximate calculation method, obtains multiple sampling instructions；

Using fault injection methods to each sampling instruction injection failure, sampling faulting instruction is obtained；

According to each sampling instruction and corresponding sampling faulting instruction, error amount is calculated；The error amount includes that sampling refers to The poly- heap error amount of maximum of output and corresponding sampling faulting instruction output, sampling instruction output is enabled to refer to corresponding sampling failure Enable the maximum value of the relative error of output and the matrix error of sampling instruction output and corresponding sampling faulting instruction output；

According to the error amount obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group.

Optionally, described that failure is injected to each sampling instruction using fault injection methods, obtain sampling faulting instruction, tool Body includes:

Construct direct fault location platform；Integrated debugging controls software, direct fault location software, hardware on the direct fault location platform Emulator and emulation backboard；

Using the direct fault location platform to each sampling instruction injection failure, sampling faulting instruction is obtained.

Optionally, the building objective optimization function, specifically includes:

Building is using performance parameter as target, the function of energy consumption parameter and output mass parameter as constraint condition；It is described Function is objective optimization function.

Optionally, described by the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output Quality predictions solve the objective optimization function using simulated annealing as input, and it is close to obtain optimal voltage- Like level data group, specifically include:

By the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions As input, the voltage-degree of approximation data group for meeting optimal conditions is obtained using simulated annealing；The optimal conditions are Performance prediction value is maximum, energy consumption predicted value is less than default energy consumption preset value and output quality predictions are pre- less than default output quality If being worth or performance prediction value being as the reduction of annealing temperature is become smaller with predeterminated frequency；

The voltage for meeting optimal conditions-degree of approximation data group is determined as optimal voltage-degree of approximation data group.

The present invention also provides a kind of processor core optimization systems based on nearly threshold calculations, comprising:

Data acquisition module, for obtaining multiple groups voltage-degree of approximation data group；The each voltage-degree of approximation number It include a voltage and corresponding degree of approximation value according to group；

Predicted value obtains module, for obtaining using voltage described in multiple groups-degree of approximation data group as the input of processor core To the corresponding performance prediction value of voltage described in every group-degree of approximation data group, energy consumption predicted value and output quality predictions；

Objective function constructs module, for constructing objective optimization function；

Module is solved, for by the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and defeated Mass predicted value solves the objective optimization function using simulated annealing, obtains optimal voltage-as input Degree of approximation data group；

Optimal set determining module, for the voltage in the optimal voltage-degree of approximation data group to be determined as nearly threshold value Optimal voltage under calculating state, the degree of approximation value in the optimal voltage-degree of approximation data group is as best fit approximation journey Degree；The processor core operates under the optimal voltage and the best fit approximation degree.

Optionally, the predicted value obtains module, specifically includes:

Performance prediction unit, for using voltage described in the multiple groups-degree of approximation data group as the defeated of Performance Predicter Enter, using approximate calculation method obtain every group described in the corresponding performance prediction value of voltage-degree of approximation data group

IPS_i=Av_i+ΔIPS_i,

Energy consumption predicting unit, for voltage described in the multiple groups-degree of approximation data group is defeated as energy consumption fallout predictor Enter, using approximate calculation method obtain every group described in the corresponding energy consumption predicted value of voltage-degree of approximation data group

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Prediction of quality unit is exported, for voltage described in the multiple groups-degree of approximation data group is pre- as output quality Survey device input, using approximate calculation method and fault injection methods obtain every group described in voltage-degree of approximation data group it is corresponding Export quality predictions.

Optionally, the output prediction of quality unit, specifically includes:

Classification subelement, for using voltage described in the multiple groups-degree of approximation data group as output quality predictor Input, classifies to the instruction in the application program executed on the processor core, obtains multiple instruction classification；It is each described Classes of instructions includes the similar instruction of multiple propagation paths；

Sub-unit obtains multiple sampling for being sampled using approximate calculation method to each described instruction classification Instruction；

Direct fault location subelement is used to obtain sampling event to each sampling instruction injection failure using fault injection methods Barrier instruction；

Error calculation subelement, for calculating error amount according to each sampling instruction and corresponding sampling faulting instruction；Institute Stating error amount includes sampling instruction output and the poly- heap error amount of maximum of corresponding sampling faulting instruction output, sampling instruction output And the maximum value of the relative error of corresponding sampling faulting instruction output and sampling instruction output refer to corresponding sampling failure Enable the matrix error of output；

Prediction of quality subelement is exported, for voltage-degree of approximation data group described in obtaining every group according to the error amount Corresponding output quality predictions.

Optionally, the objective function constructs module, specifically includes:

Performance objective construction unit is made for constructing using performance parameter as target, energy consumption parameter and output mass parameter For the function of constraint condition；The function is objective optimization function；

The solution module, specifically includes:

Optimize unit, for by the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and defeated Mass predicted value obtains the voltage-degree of approximation data group for meeting optimal conditions using simulated annealing as input；Institute Stating optimal conditions, maximum, energy consumption predicted value is less than default energy consumption preset value and output quality predictions less than pre- for performance prediction value If exporting quality preset value or performance prediction value as the reduction of annealing temperature is become smaller with predeterminated frequency；

Determination unit, for the voltage-degree of approximation data group for meeting optimal conditions to be determined as optimal voltage-approximation journey Spend data group.

Compared with prior art, the beneficial effects of the present invention are:

The invention proposes a kind of processor core optimization methods and system based on nearly threshold calculations, which comprises Obtain voltage-degree of approximation data group that multiple groups include voltage and corresponding degree of approximation value；By multiple groups voltage-degree of approximation number According to the input group as processor core, the corresponding performance prediction value of every group of voltage-degree of approximation data group, energy consumption predicted value are obtained With output quality predictions；By the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output matter Predicted value is measured as input, objective optimization function is solved using simulated annealing, obtains optimal voltage-degree of approximation Data group；Voltage in optimal voltage-degree of approximation data group is determined as the optimal voltage under nearly threshold calculations state, it is optimal Degree of approximation value in voltage-degree of approximation data group is as best fit approximation degree.The present invention considers voltage and approximate journey simultaneously Angle value, and energy consumption, performance and output quality comprehensive are considered, optimal voltage-degree of approximation number is selected using simulated annealing According to group, multi-dimensional optimization is realized, processor core is made to operate in optimal voltage and best fit approximation degree under nearly threshold calculations state Under, high reliablity.

Detailed description of the invention

It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is the flow chart of processor core optimization method of the embodiment of the present invention 1 based on nearly threshold calculations；

Fig. 2 is the block diagram of processor core optimization method of the embodiment of the present invention 2 based on nearly threshold calculations；

Fig. 3 is the block diagram that the embodiment of the present invention 2 exports quality predictor；

Fig. 4 is the system framework figure of 2 direct fault location platform of the embodiment of the present invention；

Fig. 5 is the structural schematic diagram of processor core optimization system of the embodiment of the present invention 3 based on nearly threshold calculations.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Fig. 1 is a kind of flow chart of the processor core optimization method based on nearly threshold calculations of the embodiment of the present invention.

Referring to Fig. 1, the processor core optimization method based on nearly threshold calculations of embodiment, comprising:

Step S1: multiple groups voltage-degree of approximation data group is obtained.

The each voltage-degree of approximation data group includes a voltage and corresponding degree of approximation value.Due to area The scaling section of the limitation of expense, voltage level is limited, and the variation of voltage level be not it is continuous, therefore, voltage Alternative value is determining；In the system using approximate calculation, different degrees of approximation can choose, for example, refreshing Change degree of approximation by changing different network topologies through network approximation technique, mantissa rounding approximation technique is cut by changing The digit fallen changes different degrees of approximation.The selectable range of degree of approximation is limited, for example, neural network approximation skill The digit that can be intercepted in selectable network topology and mantissa rounding in art is all determining.Therefore, voltage and degree of approximation value Combination is also limited, i.e. voltage-degree of approximation data group number is limited.

Step S2: using voltage described in multiple groups-degree of approximation data group as the input of processor core, obtain every group described in electricity The corresponding performance prediction value of pressure-degree of approximation data group, energy consumption predicted value and output quality predictions.

The step S2, specifically includes:

1) using voltage described in the multiple groups-degree of approximation data group as the input of Performance Predicter, using approximate calculation Method obtain every group described in the corresponding performance prediction value of voltage-degree of approximation data group

IPS_i=Av_i+ΔIPS_i,

Wherein, v_iIndicate that the voltage in i-th group of voltage-degree of approximation data group, A are constant, A depends on processor core The application program executed in configuration and processor core, Δ IPS_iIndicate approximate calculation method to the influence degree of performance.

2) using voltage described in the multiple groups-degree of approximation data group as the input of energy consumption fallout predictor, using approximate calculation Method obtain every group described in the corresponding energy consumption predicted value of voltage-degree of approximation data group

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Wherein, β_iIndicate i-th group of voltage-degree of approximation data group it is one 0 to 1 corresponding between constant, β_iDepending on use Desirability of the family to voltage, C expression constant, the configuration that C depends on processor core indicate constant, m_iIndicate that i-th group of voltage-is close Like the degree of approximation value in level data group, D is the influence degree depending on approximate calculation method to energy consumption.

3) using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, using approximation Calculation method and fault injection methods obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group.Specifically Include:

31) using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, to the place The instruction in application program executed on reason device core is classified, and multiple instruction classification is obtained；Each described instruction classification includes The similar instruction of multiple propagation paths.

32) each described instruction classification is sampled using approximate calculation method, obtains multiple sampling instructions.

33) sampling faulting instruction is obtained to each sampling instruction injection failure using fault injection methods；Specifically, first Direct fault location platform is first constructed, integrated debugging controls software, direct fault location software, hardware emulator on the direct fault location platform With emulation backboard；Then sampling faulting instruction is obtained to each sampling instruction injection failure using the direct fault location platform.

34) according to each sampling instruction and corresponding sampling faulting instruction, error amount is calculated；The error amount includes sampling The poly- heap error amount of maximum of instruction output and corresponding faulting instruction output of sampling, sampling instruction export and corresponding sampling failure Instruct the maximum value of the relative error of output and the matrix error of sampling instruction output and corresponding sampling faulting instruction output.

35) according to the error amount obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group.

Step S3: building objective optimization function.

The objective optimization function can be using performance parameter as target, and energy consumption parameter and output mass parameter are as about The function of beam condition；It can be using performance parameter as target, function of the energy consumption parameter as constraint condition；It can be with energy consumption Parameter is as target, the function of performance parameter and output mass parameter as constraint condition；It can be using energy consumption parameter as mesh Mark, function of the performance parameter as constraint condition.The present embodiment is using performance parameter as target, energy consumption parameter and output quality ginseng The function as constraint condition is counted as objective optimization function.

Step S4: by the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality Predicted value solves the objective optimization function using simulated annealing, obtains optimal voltage-approximation journey as input Spend data group.

Using performance parameter as target, energy consumption parameter and output mass parameter are excellent as target as the function of constraint condition When changing function, is solved using simulated annealing, obtains optimal voltage-degree of approximation data group detailed process are as follows:

1) by the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output prediction of quality Value obtains the voltage-degree of approximation data group for meeting optimal conditions using simulated annealing as input；The optimal conditions For performance prediction value is maximum, energy consumption predicted value is less than default energy consumption preset value and output quality predictions are less than default output quality Preset value or performance prediction value are become smaller with the reduction of annealing temperature with predeterminated frequency.

2) voltage for meeting optimal conditions-degree of approximation data group is determined as optimal voltage-degree of approximation data group.

Step S5: the voltage in the optimal voltage-degree of approximation data group is determined as under nearly threshold calculations state Optimal voltage, the degree of approximation value in the optimal voltage-degree of approximation data group is as best fit approximation degree；The processor Core operates under the optimal voltage and the best fit approximation degree.

Processor core optimization method of the present embodiment based on nearly threshold calculations, can efficiently find optimum voltage-approximation Level data group, to obtain the effect of optimization of optimal performance, energy and output quality three-dimensional, this method high reliablity.The party Prediction model is made of three fallout predictors in method, is used for estimated performance, exports quality and energy, wherein output quality predictor is logical The fault filling method for crossing Hardware/Software Collaborative Design simulates in nearly threshold calculations system static failure to predict output quality, passes through Simulated annealing optimization algorithm configures to find the optimum voltage rank of system and degree of approximation in the nearly thresholding system of processor core, should Configuration can maximize performance under given energy/output quality requirement, or under given performance/output quality requirement Energy is minimized, reliability is further improved.

Embodiment 2:

The present embodiment in order to obtain optimal voltage and degree of approximation value so that processor core has in the process of running Optimal energy consumption, performance and the three-dimensional optimized effect for exporting quality, firstly, a multi-dimensional optimization model is established, the model packet Including three fallout predictors is respectively: then Performance Predicter, energy consumption fallout predictor and output quality predictor pass through simulated annealing Optimal voltage level and degree of approximation combination are obtained to obtain the effect of optimization of optimal multidimensional.As shown in Fig. 2, multi-dimensional optimization The input of model is output quality threshold, energy consumption budget, system configuration and can approximate code region；Three fallout predictors in model It can predict output quality of nearly threshold calculations system under the conditions of certain given voltage rank and degree of approximation, energy consumption and performance (performance indicates that IPS refers to the instruction number run in the unit time with IPS)；Three above fallout predictor is finally by simulated annealing Algorithm obtains optimum voltage rank and degree of approximation combination, to obtain the effect of optimization of multidimensional.

1, Performance Predicter

Performance Predicter describes performance with IPS, and Performance Predicter is specific as follows shown:

IPS_i=Av_i+ΔIPS_i,

Wherein, IPS_iIndicate performance predicted value, v_iIndicate that the voltage in i-th group of voltage-degree of approximation data group, A are normal Amount, the application program that A is executed in the configuration and processor core depending on processor core, Δ IPS_iIndicate approximate calculation method to property The influence degree of energy, A is bigger, then IPS is more sensitive to the variation of voltage.

2, energy consumption fallout predictor

Shown in energy consumption fallout predictor is specific as follows:

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Wherein, Energy_iIndicate energy consumption predicted value, β_iIndicate that i-th group of voltage-degree of approximation data group corresponding one 0 is arrived Constant between 1, β_iDepending on user to the desirability of voltage, C indicates constant, and the configuration that C depends on processor core indicates Constant, m_iIndicate the degree of approximation value in i-th group of voltage-degree of approximation data group, D is to depend on approximate calculation method to energy consumption Influence degree.

3, quality predictor is exported

The starting point of output quality predictor design is: if instructing the data controlled with " similar " propagation path, These data can generate similar influence to the output quality of program in case of mistake.Instruction is divided into according to this starting point A series of group, wherein data dissemination has being classified into same group of " similar " attribute.

As shown in figure 3, no matter first kind instruction will not all produce output quality using which kind of voltage level and degree of approximation It is raw to influence.Four instruction groups being divided into are as follows: 1. NOP instruction, is null statement, does not do any operation, is not appointed to output result What is influenced；It is prefetched 2. mistake occurs in Performance-enhancing instructions performance enhancement instruction and may cause In vain, but Program Semantics will not change；3. Predicated-false instructions instructs assertion failed, the knot of instruction Fruit will be dropped, but not influence program operation；4. the dead instruction of Dynamically dead instructions dynamic, As a result it will not be used by system, will not influence to export result.

The instruction of second class will cause different degrees of influence to output quality using different degrees of approximate calculation method.Allusion quotation The grouping of type has the instruction for saving the related data in relation to filter and pixel；The instruction of the convolutional calculation involved in convolution is can Approximately；The instruction for being loaded and stored to image destination from image source is also can be approximate.Then we adopt according to grouping Representative sampling is selected to instruct from each group with approximate calculation method.

Due to the limitation of area overhead, the scaling section of voltage level is limited, while the variation of voltage level is not Continuously, so that the alternative value of voltage level is determining.In the system using approximate calculation method, degree of approximation Selection is also limited.Therefore, the combination of voltage and degree of approximation is also limited.Since data group is limited, it can To assess each sampling instruction by direct fault location and propagation analysis under each possible voltage and degree of approximation to output The influence of quality.In order to simulate the static failure of some voltage, direct fault location platform is used to from the every group of representative selected instruction Inject failure.The influence that can be obtained to output quality is analyzed in the propagation for passing through mistake after direct fault location.

In order to quantify to propose three kinds of output quality index for different types of application: 1. to the influence of output quality Max-abs-diff: the index gives the maximum absolute difference between complete correct output and failure output；②max- Rel-err: the maximum value of the relative error between correct output and failure output is calculated；3. rel-l2-norm: for directly ratio Compared with the error of two matrixes.As shown in figure 3, the influence after direct fault location to output result is passed through in each sampling instruction of statistics, and These influences are quantized into three above-mentioned indexs.Specific quantizing process is will to be given above three quality index to be abstracted into " output quality bucket " (Quality Bucket) quantifies output quality, to defeated after then observation instruction is inputted through failure Which " output quality bucket " influence of mass falls in.As shown in table 1, it by the method for direct fault location, obtains various " the output quality bucket " of each instruction group under different voltage levels and degree of approximation.Assuming that a program is given, by finger The direct fault location of order and propagation analysis can know the distribution of all instructions in table 1, can be predicted according to distribution corresponding defeated Mass.

Table 1

4, direct fault location platform

Direct fault location platform in the present embodiment is designed based on backplane technology, and Fig. 4 is 2 direct fault location of the embodiment of the present invention The system framework figure of platform.Referring to fig. 4, integrated debugging control software, direct fault location software and hardware are imitative on direct fault location platform True device.

Debugging control software is responsible for compiling and configuring the software program run on target processor, can be by figure circle Face or order line provide input and output, are communicated by network or serial ports with bottom hardware, software is controlled debugging by emulation backboard System orders the target processor being sent in hardware emulator, is simultaneously also by what emulation backboard received hardware processor return System state is for debugging.

Direct fault location software receives the parameters such as fault injection time and position by user interface, is received by emulation backboard The hardware signal list that hardware is sent, the searching loop searched for deeply by an elder generation, complete all signals layering identification and Record establishes hierarchical resource pool for directly positioning Injection Signal in direct fault location, then generates the event for being directed to hardware signal Hinder library, failure and Simulation Control information generated is sent to target processor by emulating backboard, and connects by emulating backboard State after receiving target processor injection.

Fault-tolerant processor prototype executes in hardware emulator, and hardware emulator selects LEON2 as CPU core, LEON2 Technical characterstic mainly have: using SPARCV8 structure, using internal AMBA bus structures, fault-tolerant design and VHDL coding style. The processor unit of LENO2 mainly includes integer unit (5-stage IU), the floating-point list for floating-point operation of Pyatyi flowing water Member (FPU), coprocessor unit (CP).Integer unit 5-stage IU has isolated data cache (Dcache) and instruction Cache (Icache), in addition to this there are also memory management unit (MMU).The On-Chip peripheral of LENO2 includes having external component mutual The even bus (PCI) of standard, Ethernet interface (net), dynamic random access memory (DRAM) and debugging supporter (DSU).DSU can be arranged processor to debugging mode, and all registers and Cache of processor can be read and write by it. DSU further includes a trace cache, the data that can be saved the instruction executed and transmit on AHB.By having trace cache Internal debugging cells D SU receive and software command and send execution state.The virtual hardware such as network interface card and serial ports of emulation backboard are set It is standby, control command and direct fault location information are sent to by hardware DSU, the letter that DSU is returned according to the demand of software and hardware information exchange Breath is then according to content distribution to different destinations, and the Debugging message that debugger is observed is sent to debugging software, by signal The direct fault locations information needed such as state is sent to direct fault location software, realizes control command and returns the result the biography for waiting communication datas Defeated and scheduling.

The hybrid programming that backboard passes through VHDL and C language is emulated, it is real by means of the external language interface of hardware description language Information exchange seamless between mutual association, mapping and data control block between existing hardware emulator and software systems. External language interface can design excitation with C language and execute simulating, verifying task in hardware emulator.Hardware interface part The signal monitoring and signal immediate status read functions provided by means of hardware description language external interface will be closed in hardware module The signal extraction of note comes out, and realizes direct fault location by signal logic value pressure assignment, signal logic value immediately occurred after injection Variation, will not influence the signal that other in circuit have logical relation with the signal, simulate when single event occurs to circuit Impact effect.

Direct fault location platform completely disengages physical prototype by the collaborative simulation based on backplane technology in the present embodiment, makes Designer is as far as possible in design early detection and time update mistake, to reduce development cost.Emulate software portion on backboard Divide the sensitive signal example in corresponding hardware description language module, when logic circuit triggers the sensitive signal of the module, emulation Device will call the program in corresponding dynamic link library, be mapped to high-level language for signal defined in hardware description language is seamless Under environment, at this time to any processing of this signal logic value can software-based algorithm realize that no matter is high-level language It is all easier than hardware language and scripting language in terms of Row control or function call, it is no longer influenced by hardware language and mould The limitation of quasi- device emulation command, it is thus possible to support more complicated fault model and algorithm, and high-level language is with good Good portability is convenient for more multi-functional extension.And software program will not be integrated into when downloading in chip, therefore not increased Added logic also avoids intrusion and influence on hardware module, had not only protected the integrality of object module, but also test is tied Fruit has more authenticity.

5, optimization algorithm

It needs the problem of optimizing as follows: having one group of processor core PE (PE_1......N) it may operate in M (v_1......M) it is a not Same voltage level and K (m_1......K) under a different degree of approximation, selected most from M × K kind voltage and degree of approximation combination Good combination, to obtain the three-dimensional best compromise optimization of performance, power consumption and output quality.Table 2 give four kinds it is different excellent Change strategy.In situation 1, target is to maximize performance, and P-E+R indicates that performance is optimization aim, and energy and output quality are Constraint condition, P-E indicates that performance is optimization aim, and energy is constraint.In situation 2, optimization aim is to minimize energy, E- P+R indicates that energy is objective function, and performance and output quality are constraints, and E-P indicates that energy is objective function, and performance is Constraint exports quality without considering.The folding between energy, performance and output quality is assessed in the present embodiment using index EEPI In effect of optimization, EEPI is specific as follows:

EEPI=ErrorEnergy/IPS,

Wherein Error is the output mass loss of system, and Energy is the energy consumption of system, and IPS indicates the performance of system, EEPI value is lower, and the complex optimum effect for indicating performance, energy consumption and output quality three is better.

Table 2

It selects P-E+R as prioritization scheme below, which is solved by simulated annealing (SA) algorithm.Simulation is moved back Fiery algorithm is a kind of Stochastic Optimization Algorithms based on Monte-Carlo iterative solution strategy, solid matter in analogies Neo-Confucianism Annealing process solve optimization problem.SA can find the globally optimal solution of objective function in the case of a temperature drop.SA Sharpest edges be can jump out local optimum solve scheme, be finally intended to global optimum solve scheme.Utilize simulated annealing What is optimized is briefly described below: the voltage and degree of approximation of initialization temperature and processor core combination first, passage capacity Fallout predictor obtains initial performance, then constantly changes voltage and degree of approximation combination, if energy consumption and output quality satisfaction are wanted Performance becomes larger then more Combination nova and performance when seeking common ground, if performance is become smaller with certain probability with the reduction of annealing temperature, Then also more Combination nova and performance find optimal voltage eventually by constantly iteration and degree of approximation combine.It is moved back by simulation The code that fiery algorithm obtains optimal solution is as follows:

Optimum voltage level and degree of approximation combination can be obtained using as optimal solution by above-mentioned algorithm.Specifically, adopting With a series of C={ c₁,c₂,c₃,...,c_nConfiguration initialization PE, wherein the corresponding c of b-th of PE_bFor voltage level and approximate journey Degree combination.The algorithm passes through annealing mobile ANNEAL_MOVE () first come while maximizing IPS, and it is empty to explore entire solution Between；Then algorithm checks IPS_newWhether IPS, Energy are greater than_newWhether Energy is lower than_budget(energy consumption budget) and OQLOSS_newWhether OQLOSS is lower than_threshold(output quality threshold), if above-mentioned condition is all unsatisfactory for, IPS_newWith certain Probability become smaller with the reduction of annealing temperature；Meet the combination of above-mentioned condition if it exists, then the combination for meeting condition determines It is combined for optimum voltage rank and degree of approximation；Processor core operates under optimum voltage rank and degree of approximation combination, can Performance is improved to the maximum extent is able to satisfy requirement of the user to output quality and performance again simultaneously.What above-mentioned specific algorithm provided is Using performance as target, energy and output quality similarly can also be using other three kinds for the concrete example of constraint condition (P-E+R) Optimisation strategy (E-P+R, E-P or P-E) combines simulated annealing prioritization scheme the most.

The processor core optimization method based on nearly threshold calculations in the present embodiment, is set using the NTC system of approximate calculation Multi-dimensional optimization model is counted, which includes that three fallout predictors are respectively: Performance Predicter, energy consumption fallout predictor and output prediction of quality Then device obtains optimal voltage level finally by simulated annealing optimization algorithm and degree of approximation combines.The method achieve Automation selection voltage level and degree of approximation ensure that processor core optimal energy consumption, performance and the output quality the case where Lower operation, realizes multi-dimensional optimization, improves the reliability of processor core optimization method.

Mistake in instruction is divided into one group with approximate propagation path, so by the output quality predictor wherein designed Representational sampling instruction is selected from these groups afterwards, it is given at certain to analyze these instructions using the method for direct fault location Quality is exported under voltage level and degree of approximation, wherein direct fault location platform, based on emulation back plate design, realize based on VHDL Design with the hybrid-language programming of C, the output quality predictor further improves the reliability of method.

In addition, by analyzing energetic efficiency characteristic and fault-tolerant spy in different application under certain voltage rank and degree of approximation Sign has evaluated the influence that above-mentioned optimization method applies these in NTC system, and finds that Optimized model can be with aware application Characteristic, go the energy for adjusting each program and IPS allocation proportion, come obtain whole system optimal multi-dimensional optimization effect power Weighing apparatus, demonstrates the validity of above-mentioned optimization method.

Embodiment 3:

The present invention also provides a kind of processor core optimization system based on nearly threshold calculations, Fig. 5 is the embodiment of the present invention 3 The structural schematic diagram of processor core optimization system based on nearly threshold calculations.

Referring to Fig. 5, the processor core optimization system based on nearly threshold calculations of embodiment includes:

Data acquisition module 501, for obtaining multiple groups voltage-degree of approximation data group；Each voltage-degree of approximation Data group includes a voltage and corresponding degree of approximation value.

Predicted value obtains module 502, for using voltage described in multiple groups-degree of approximation data group as the defeated of processor core Enter, obtain every group described in the corresponding performance prediction value of voltage-degree of approximation data group, energy consumption predicted value and output prediction of quality Value.

Objective function constructs module 503, for constructing objective optimization function.

Module 504 is solved, is used for the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value With output quality predictions as input, the objective optimization function is solved using simulated annealing, is obtained optimal Voltage-degree of approximation data group.

Optimal set determining module 505, for the voltage in the optimal voltage-degree of approximation data group to be determined as nearly threshold The optimal voltage being worth under calculating state, the degree of approximation value in the optimal voltage-degree of approximation data group is as best fit approximation Degree；The processor core operates under the optimal voltage and the best fit approximation degree.

As an alternative embodiment, the predicted value obtains module 502, specifically include:

IPS_i=Av_i+ΔIPS_i,

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

The wherein output prediction of quality unit, specifically includes:

Classification subelement, for using voltage described in the multiple groups-degree of approximation data group as output quality predictor Input, classifies to the instruction in the application program executed on the processor core, obtains multiple instruction classification；It is each described Classes of instructions includes the similar instruction of multiple propagation paths.

Sub-unit obtains multiple sampling for being sampled using approximate calculation method to each described instruction classification Instruction.

Direct fault location subelement is used to obtain sampling event to each sampling instruction injection failure using fault injection methods Barrier instruction.

Error calculation subelement, for calculating error amount according to each sampling instruction and corresponding sampling faulting instruction；Institute Stating error amount includes sampling instruction output and the poly- heap error amount of maximum of corresponding sampling faulting instruction output, sampling instruction output And the maximum value of the relative error of corresponding sampling faulting instruction output and sampling instruction output refer to corresponding sampling failure Enable the matrix error of output.

As an alternative embodiment, the objective function constructs module 503, specifically include:

Performance objective construction unit is made for constructing using performance parameter as target, energy consumption parameter and output mass parameter For the function of constraint condition；The function is objective optimization function.

The solution module 504, specifically includes:

The processor core optimization system based on nearly threshold calculations of the present embodiment can be realized automation selection voltage level And degree of approximation, guarantee that processor core is run in the case where optimal energy consumption, performance and output quality, realize multi-dimensional optimization, Improve the reliability of processor core optimization method.

For the system disclosed in the embodiment 3, since it is corresponding with method disclosed in embodiment 1 or 2, so description It is fairly simple, reference may be made to the description of the method.

Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims

1. a kind of processor core optimization method based on nearly threshold calculations characterized by comprising

Obtain multiple groups voltage-degree of approximation data group；The each voltage-degree of approximation data group includes a voltage and right The degree of approximation value answered；

Using voltage described in multiple groups-degree of approximation data group as the input of processor core, obtain every group described in voltage-degree of approximation The corresponding performance prediction value of data group, energy consumption predicted value and output quality predictions；

Construct objective optimization function；

Using the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions as Input, solves the objective optimization function using simulated annealing, obtains optimal voltage-degree of approximation data group；

Voltage in the optimal voltage-degree of approximation data group is determined as the optimal voltage under nearly threshold calculations state, institute The degree of approximation value in optimal voltage-degree of approximation data group is stated as best fit approximation degree；The processor core operates in institute It states under optimal voltage and the best fit approximation degree.

2. a kind of processor core optimization method based on nearly threshold calculations according to claim 1, which is characterized in that described Using voltage described in multiple groups-degree of approximation data group as the input of processor core, obtain every group described in voltage-degree of approximation data The corresponding performance prediction value of group, energy consumption predicted value and output quality predictions, specifically include:

Using voltage described in the multiple groups-degree of approximation data group as the input of Performance Predicter, obtained using approximate calculation method To the corresponding performance prediction value of voltage described in every group-degree of approximation data group

IPS_i=Av_i+ΔIPS_i,

Wherein, v_iIndicate that the voltage in i-th group of voltage-degree of approximation data group, A are constant, A depends on the configuration of processor core With the application program executed on processor core, Δ IPS_iIndicate approximate calculation method to the influence degree of performance；

Using voltage described in the multiple groups-degree of approximation data group as the input of energy consumption fallout predictor, obtained using approximate calculation method To the corresponding energy consumption predicted value of voltage described in every group-degree of approximation data group

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Wherein, β_iIndicate i-th group of voltage-degree of approximation data group it is one 0 to 1 corresponding between constant, β_iDepending on user couple The desirability of voltage, C indicate constant, and the configuration that C depends on processor core indicates constant, m_iIndicate i-th group of voltage-approximation journey The degree of approximation value in data group is spent, D is the influence degree depending on approximate calculation method to energy consumption；

Using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, using approximate calculation side Method and fault injection methods obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group.

3. a kind of processor core optimization method based on nearly threshold calculations according to claim 2, which is characterized in that described Using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, using approximate calculation method and Fault injection methods obtain every group described in the corresponding output quality predictions of voltage-degree of approximation data group, specifically include:

Using voltage described in the multiple groups-degree of approximation data group as the input of output quality predictor, to the processor core Instruction in the application program of upper execution is classified, and multiple instruction classification is obtained；Each described instruction classification includes multiple biographies Broadcast the similar instruction in path；

According to each sampling instruction and corresponding sampling faulting instruction, error amount is calculated；The error amount includes that sampling instruction is defeated Out and the poly- heap error amount of maximum of corresponding sampling faulting instruction output, sampling instruct output defeated with corresponding sampling faulting instruction The maximum value of relative error out and the matrix error of sampling instruction output and corresponding sampling faulting instruction output；

4. a kind of processor core optimization method based on nearly threshold calculations according to claim 3, which is characterized in that described Using fault injection methods to each sampling instruction injection failure, sampling faulting instruction is obtained, is specifically included:

Construct direct fault location platform；Integrated debugging controls software, direct fault location software, simulation hardware on the direct fault location platform Device and emulation backboard；

5. a kind of processor core optimization method based on nearly threshold calculations according to claim 1, which is characterized in that described Objective optimization function is constructed, is specifically included:

Building is using performance parameter as target, the function of energy consumption parameter and output mass parameter as constraint condition；The function For objective optimization function.

6. a kind of processor core optimization method based on nearly threshold calculations according to claim 5, which is characterized in that described Using the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions as input, The objective optimization function is solved using simulated annealing, obtains optimal voltage-degree of approximation data group, it is specific to wrap It includes:

Using the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output quality predictions as Input, the voltage-degree of approximation data group for meeting optimal conditions is obtained using simulated annealing；The optimal conditions are performance Predicted value is maximum, energy consumption predicted value is less than default energy consumption preset value and output quality predictions are default less than default output quality It is worth or performance prediction value is as the reduction of annealing temperature is become smaller with predeterminated frequency；

7. a kind of processor core optimization system based on nearly threshold calculations characterized by comprising

Data acquisition module, for obtaining multiple groups voltage-degree of approximation data group；The each voltage-degree of approximation data group It include a voltage and corresponding degree of approximation value；

Predicted value obtains module, every for obtaining using voltage described in multiple groups-degree of approximation data group as the input of processor core The corresponding performance prediction value of the group voltage-degree of approximation data group, energy consumption predicted value and output quality predictions；

Module is solved, is used for the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output matter Predicted value is measured as input, the objective optimization function is solved using simulated annealing, obtains optimal voltage-approximation Level data group；

Optimal set determining module, for the voltage in the optimal voltage-degree of approximation data group to be determined as nearly threshold calculations Optimal voltage under state, the degree of approximation value in the optimal voltage-degree of approximation data group is as best fit approximation degree；Institute Processor core is stated to operate under the optimal voltage and the best fit approximation degree.

8. a kind of processor core optimization system based on nearly threshold calculations according to claim 7, which is characterized in that described Predicted value obtains module, specifically includes:

Performance prediction unit, for adopting using voltage described in the multiple groups-degree of approximation data group as the input of Performance Predicter With approximate calculation method obtain every group described in the corresponding performance prediction value of voltage-degree of approximation data group

IPS_i=Av_i+ΔIPS_i,

Energy consumption predicting unit, for adopting using voltage described in the multiple groups-degree of approximation data group as the input of energy consumption fallout predictor With approximate calculation method obtain every group described in the corresponding energy consumption predicted value of voltage-degree of approximation data group

Energy_i=(β_iv_i)²C+(β_iv_i)²m_iD,

Prediction of quality unit is exported, for using voltage described in the multiple groups-degree of approximation data group as output quality predictor Input, using approximate calculation method and fault injection methods obtain every group described in voltage-corresponding output of degree of approximation data group Quality predictions.

9. a kind of processor core optimization system based on nearly threshold calculations according to claim 8, which is characterized in that described Prediction of quality unit is exported, is specifically included:

Classify subelement, for using voltage described in the multiple groups-degree of approximation data group as export quality predictor input, Classify to the instruction in the application program executed on the processor core, obtains multiple instruction classification；Each described instruction Classification includes the similar instruction of multiple propagation paths；

Sub-unit obtains multiple sampling instructions for being sampled using approximate calculation method to each described instruction classification；

Direct fault location subelement, for, to each sampling instruction injection failure, obtaining sampling failure using fault injection methods and referring to It enables；

Error calculation subelement, for calculating error amount according to each sampling instruction and corresponding sampling faulting instruction；The mistake Difference include sampling instruction output with the poly- heap error amount of maximum of corresponding sampling faulting instruction output, sampling instruction export with it is right The maximum value of the relative error for the sampling faulting instruction output answered and sampling instruction output are defeated with corresponding sampling faulting instruction Matrix error out；

Prediction of quality subelement is exported, it is corresponding for voltage described in obtaining every group according to the error amount-degree of approximation data group Output quality predictions.

10. a kind of processor core optimization system based on nearly threshold calculations according to claim 7, which is characterized in that institute Objective function building module is stated, is specifically included:

Performance objective construction unit, for constructing using performance parameter as target, energy consumption parameter and output mass parameter are as about The function of beam condition；The function is objective optimization function；

The solution module, specifically includes:

Optimize unit, is used for the corresponding performance prediction value of all voltages-degree of approximation data group, energy consumption predicted value and output matter Predicted value is measured as input, the voltage-degree of approximation data group for meeting optimal conditions is obtained using simulated annealing；It is described excellent For performance prediction value, maximum, energy consumption predicted value is less than default energy consumption preset value to change condition and output quality predictions are defeated less than presetting Mass preset value or performance prediction value are become smaller with the reduction of annealing temperature with predeterminated frequency；

Determination unit, for the voltage-degree of approximation data group for meeting optimal conditions to be determined as optimal voltage-degree of approximation number According to group.