CN108519906A - Superscale out-of order processor stable state instructs throughput modeling method - Google Patents

Superscale out-of order processor stable state instructs throughput modeling method Download PDF

Info

Publication number
CN108519906A
CN108519906A CN201810229640.8A CN201810229640A CN108519906A CN 108519906 A CN108519906 A CN 108519906A CN 201810229640 A CN201810229640 A CN 201810229640A CN 108519906 A CN108519906 A CN 108519906A
Authority
CN
China
Prior art keywords
instruction
neural network
throughput
micro
stable state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810229640.8A
Other languages
Chinese (zh)
Other versions
CN108519906B (en
Inventor
凌明
季柯丞
张凌峰
李宽
时龙兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201810229640.8A priority Critical patent/CN108519906B/en
Publication of CN108519906A publication Critical patent/CN108519906A/en
Application granted granted Critical
Publication of CN108519906B publication Critical patent/CN108519906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45554Instruction set architectures of guest OS and hypervisor or native processor differ, e.g. Bochs or VirtualPC on PowerPC MacOS

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of superscale out-of order processor stable states to instruct throughput modeling method, obtains each statistics stage and the relevant micro-architecture independent parameter of stable state average throughput, and the micro-architecture independent parameter, which includes at least, relies on link delay distribution;Classified using clustering algorithm, selection obtains the training set of neural network;Using the micro-architecture independent parameter in the training set of selected neural network as the input of neural network, the thread steady state instruction throughput that corresponding training set is obtained by sequential accurate simulation is used as to the output of neural network, outputting and inputting for neural network is fitted, by adjusting iterations, network topology structure, transmission function and the default training precision of neural network, training obtains the steady state instruction throughput neural network model of given hardware.The unrelated feature of the micro-architecture of link delay distribution is relied on according to instruction, can rapidly and accurately predict the instruction throughput under the superscale out-of order processor stable state of given micro-architecture.

Description

Superscale out-of order processor stable state instructs throughput modeling method
Technical field
The invention belongs to Computer Architectures and modeling technique field, and artificial neural network is based on more particularly to one kind Superscale out-of order processor assembly line stable state give an order throughput modeling method.
Background technology
Framework assessment and design space exploration based on hardware behavior modeling can provide chip design guidance opinion, reduce core The piece design iteration period.In the case where par-ticular processor and designated program are run, the finger under out-of order processor assembly line stable state Throughput is enabled to characterize in no deletion events(Such as Cache missings, instruction branch prediction missing etc.)Processor when generation The limit of energy, while whether the design for also reflecting application program to a certain extent is adapted to hardware.For out-of order processor The Accurate Prediction of instruction throughput under stable state is the basis of out-of order processor overall performance analysis modeling.
Averaging instruction throughput under out-of order processor stable state refers to average every in the case where not having deletion events The number of instructions of a clock cycle transmitting.Early stage is fairly simple about the estimation of steady state instruction throughput, directly instructs front end The width of emitting stage is assumed as the average throughput under out-of order processor stable state, this method:When out-of order processor does not lack When event occurs, the instruction with front end instruction issue level width equivalent can be handled in processor each clock cycle.This side Method has ignored the considerations of to factors such as instruction dependence, functional unit value volume and range of product, instruction delay, serial command distributions, is one Kind idealization and the prodigious hypothesis of error.
Although there is research to observe the size for instructing throughput and instruction window under out-of order processor stable state in the recent period There are exponential relationships, and specific coefficient can be by being fitted to obtain after testing actual measurement.However, the shortcomings that this method, wraps It includes, first, the steady state instruction throughput that this method obtains is a constant, can only reflect being averaged in a Long time scale Value lacks dynamic;Second, the steady state instruction throughput that this method obtains is unrelated with specific software load feature, cannot Reflect the feature of different software, there are bigger errors.
Stable state give an order average throughput size and each influence factor between be not simple single interactively, Coupling effect between i.e. each factor also in the case where affecting stable state average throughput size, this undoubtedly increases mechanistic point The difficulty of analysis.Since the accurate type simulation time expense of global function sequential is excessive.The present invention is therefore.
Invention content
For the above technical problems, purpose of the present invention is to:Provide a kind of superscale out-of order processor stabilization shape State instructs throughput modeling method, and relying on link delay according to instruction is distributed the unrelated feature of this micro-architecture, rapidly and accurately Prediction gives the instruction throughput under the superscale out-of order processor stable state of micro-architecture, and prediction technique precision is high, speed is fast.
The technical scheme is that:
A kind of superscale out-of order processor stable state instruction throughput modeling method, includes the following steps:
S01:Obtain each statistics stage and the relevant micro-architecture independent parameter of stable state average throughput, the unrelated ginseng of micro-architecture Number, which includes at least, relies on link delay distribution;
S02:Classified using clustering algorithm, selection obtains the training set of neural network;
S03:Using the micro-architecture independent parameter in the training set of selected neural network as the input of neural network, will pass through Sequential accurate simulation obtains output of the thread steady state instruction throughput of corresponding training set as neural network, to neural network It outputs and inputs and is fitted, by the iterations, network topology structure, transmission function and the default instruction that adjust neural network Practice precision, training obtains the steady state instruction throughput neural network model of given hardware.
Preferably, the acquisition of link delay distribution is relied in the step S01, including:
S11:The structure for relying on link is instructed by defining, when determining every instruction entry instruction window in its own and window The dependence of other instructions, and count the size for relying on linkage length;
S12:By defining the structure of instruction type, the type of every instruction is recorded, while can obtain according to the type of instruction Call instruction executes the required time;
S13:Statistics obtains relying on link delay distribution.
Preferably, the micro-architecture independent parameter further includes the operation total time of dynamic instruction flow mixing ratio, subject thread And total number of instructions of operation.
Preferably, further include choosing suitable set time fragment length before the step S01, every a time Segment executes stream to program(Trace)Segment cutting is carried out, entire target program execution stream is divided into several segments, for every A usability of program fragments collects corresponding data set, using each usability of program fragments as a statistics stage.
Preferably, further include being located in advance to the related micro-architecture independent parameter of each segment before the step S02 Reason forms the related micro-architecture independent parameter vector of homologous segment;By Data Dimensionality Reduction Algorithm to related micro-architecture independent parameter Vector carries out dimensionality reduction, denoising, forms the micro-structure extraneous data collection of homologous segment.
Preferably, further include,
Data set comprising statistics stage all feature vectors is divided into multiple major class;
For each major class, major class is divided into a certain proportion of group using k- means clustering algorithms;
Choose in each group with a distance from central point at nearest o'clock as a feature vector.
Compared with the prediction technique of existing stable state average throughput, it is an advantage of the invention that:
The present invention utilizes proposed dependence link delay to be distributed and preferably covers the multiple micro- of influence stable state average throughput Framework independent parameter, including:Dynamic instruction mixing ratio relies on link delay distribution, can establish and more accurately stablize State throughput model.
In addition, the present invention predicts stable state averaging instruction throughput using neural network, can adequately take into account Coupling between micro-architecture independent parameter, and steady state instruction can quickly and accurately be predicted by trained model and put down The value of equal throughput.
Description of the drawings
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is the flow chart that superscale out-of order processor stable state of the present invention instructs throughput modeling method;
Fig. 2 is the particular flow sheet using present invention training artificial nerve network model;
Fig. 3 is the flow chart for relying on link delay distribution statistical method;
Fig. 4 is neural network level figure;
Fig. 5 is that neural network model training, the input of test and target export block diagram.
Specific implementation mode
Said program is described further below in conjunction with specific embodiment.It should be understood that these embodiments are for illustrating The present invention and be not limited to limit the scope of the invention.The implementation condition used in embodiment can be done according to the condition of specific producer Further adjustment, the implementation condition being not specified is usually the condition in routine experiment.
Embodiment:
As shown in Figure 1, 2, superscale out-of order processor stable state of the invention instructs throughput modeling method, including walks as follows Suddenly:
(1)The fixed program execution time leaf length of Rational choice, when instruction set simulator emulates, according to the timeslice of selection Length runs execution stream to program and cuts, to which entire target program is divided into several segments;For each slice Section collects corresponding data set.In the present embodiment, we according to thread switching interval as segmentation foundation, from target Thread starts scheduling and goes to time interval that the thread is cut out by operating system as a statistics stage(Profiling Interval).
(2)Each statistics stage ginseng unrelated with the relevant micro-architecture of stable state average throughput is obtained using instruction set simulator Number, main total number of instructions including relying on link delay distribution, target program.
The structure for relying on link is instructed by defining, when determining every instruction entry instruction window in its own and window The dependence of other instructions, and count the size for relying on linkage length;By defining the structure of instruction type, every is recorded The type of instruction, while the instruction execution required time can be obtained according to the type of instruction;In conjunction with above-mentioned two structure The case where relying on link delay distribution can be counted to obtain;At the same time, when monitoring processor instruction emitting stage and counting one section Between or an instruction stream segment in the clock number of number of instructions and cost launched.
To each ready-portioned statistics stage, the related micro-architecture independent parameter of each segment can be obtained, micro-architecture without Related parameter may include dynamic instruction flow mixing ratio(Floating-point, fixed point, the number etc. of SIMD, Load/Store instruction), at this Link delay distribution and the operation total time of subject thread and total finger of operation are relied in the statistics stage in each instruction window Enable number.
Fig. 3 is the detail flowchart for relying on link delay distribution specific implementation.When instruction has just enter into window, detection instruction The type of itself(Multi-cycle instructions or one-cycle instruction)And its and window in dependence between existing instruction, calculate and rely on Linkage length simultaneously does statistical analysis, after being finished in a statistics stage, counts the dependence link delay distribution of different length Value, obtains the dependence link delay staple diagram in the stage.
(3)Requirement in view of artificial neural network to input data, first, to the code snippet in each statistics stage Related micro-architecture independent parameter is pre-processed, and the micro-architecture independent parameter vector of homologous segment is formed;Then, pass through principal component Analysis(The pivot ingredient for including 95% or more initial data is chosen, original data volume is reduced)To the unrelated ginseng of each related micro-architecture Number vector carries out dimensionality reduction, denoising, forms the MicaData data sets of homologous segment(Micro-structure extraneous data collection);Certainly, Other Data Dimensionality Reduction Algorithms, such as LDA may be used.
The present embodiment uses BP neural network, as shown in figure 4, the present invention is according to hidden layer node number empirical equation:
Wherein:H indicates that node in hidden layer, m indicate that output layer number of nodes, n indicate that input layer number, a indicate a constant. This case uses three-layer neural network structure, wherein input is relies on link delay distribution, input layer amounts to 150 neurons, in Between hidden layer use 16 neurons, export as stable state throughput value, output layer amounts to 1 neuron;Training method uses LM (LevenbergMarquard)Algorithm.Use logsig transmission functions between input layer and hidden layer, hidden layer and output layer it Between use purelin transmission functions, the weighted value between each layer node uses trainscg(Scaled Conjugate Gradient Method)It carries out It adjusts.
(4)To the statistics stage snippet extraction feature vector remained in subject thread.First, SOM is chosen in this case (SelfOrganizingFeatureMaps, self-organized mapping network)The data set of statistics stage all feature vectors will be included It is divided into 200 major class(The number of classification can be adjusted according to the quality of classification situation);Then, for each major class(It is false If the feature vector number in class is N), use k- mean clusters(K-Means is clustered)Major class is divided into N*30% group by algorithm (The value of ratio can be adjusted according to the quality of experimental conditions);Finally, it chooses nearest with a distance from central point in each group Point as one have distinct characteristic feature vector;
(5)All characteristic points chosen by clustering algorithm are used as to the input of BP neural network, the output of BP neural network is In step 2)The subject thread stable state average throughput of middle acquisition, is fitted outputting and inputting for BP neural network, passes through Iterations, network topology structure, transmission function and the default training precision of BP neural network are adjusted, training obtains current hard Assembly line steady state instruction throughput BP neural network model under part framework.
By the way that clustering algorithm will treated that data classify to choose representative number by dimension-reduction algorithm According to the training set as neural network.The purpose that training set is chosen using clustering algorithm is to retain initial data main information Under the premise of reduce the scale of training set as far as possible.
(6)By step 5)The model obtained can be used for assembly line stable state of the other software under given hardware structure Instruct throughput.Using instruction set simulator operational objective program and software stub is added, statistics relies on link delay and is distributed phase Close data, steps for importing 5 after being pre-processed to obtained data)Obtain model, you can quickly and accurately predict score Instruction throughput under out-of order processor stable state of the journey under Current hardware framework.
Input when Fig. 5 is neural network model training and application exports block diagram with target.It is accurate by global function sequential Type simulator emulates, we are available for the parameter input of training pattern(Micro-architecture independent parameter)It is exported with target(Refer to The stable state throughput of order), to train the higher model of precision;When being predicted(When application model), it is only necessary to Show that target is answered by comparing the much faster instruction-level simulator of the accurate type simulator of global function sequential or other Trace generators With the relevant parameter of program, these parameters are then imported into model, so that it may rapidly to predict stable state average throughput value;Figure Middle bold portion is the flow of training process, and dotted portion is the flow of prediction process.
The foregoing examples are merely illustrative of the technical concept and features of the invention, its object is to allow the person skilled in the art to be It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all smart according to the present invention The equivalent transformation or modification that refreshing essence is done, should be covered by the protection scope of the present invention.

Claims (6)

1. a kind of superscale out-of order processor stable state instructs throughput modeling method, which is characterized in that include the following steps:
S01:Obtain each statistics stage and the relevant micro-architecture independent parameter of stable state average throughput, the unrelated ginseng of micro-architecture Number, which includes at least, relies on link delay distribution;
S02:Classified using clustering algorithm, selection obtains the training set of neural network;
S03:Using the micro-architecture independent parameter in the training set of selected neural network as the input of neural network, will pass through Sequential accurate simulation obtains output of the thread steady state instruction throughput of corresponding training set as neural network, to neural network It outputs and inputs and is fitted, by the iterations, network topology structure, transmission function and the default instruction that adjust neural network Practice precision, training obtains the steady state instruction throughput neural network model of given hardware.
2. superscale out-of order processor stable state according to claim 1 instructs throughput modeling method, feature to exist In, the acquisition of link delay distribution is relied in the step S01, including:
S11:The structure for relying on link is instructed by defining, when determining every instruction entry instruction window in its own and window The dependence of other instructions, and count the size for relying on linkage length;
S12:By defining the structure of instruction type, the type of every instruction is recorded, while can obtain according to the type of instruction Call instruction executes the required time;
S13:Statistics obtains relying on link delay distribution.
3. superscale out-of order processor stable state according to claim 1 instructs throughput modeling method, feature to exist In the micro-architecture independent parameter further includes the total of dynamic instruction flow mixing ratio, the operation total time of subject thread and operation Number of instructions.
4. superscale out-of order processor stable state according to claim 1 instructs throughput modeling method, feature to exist In the step S01 further includes before choosing suitable set time fragment length, is held to program every a time slice Row stream(Trace)Segment cutting is carried out, entire target program execution stream is divided into several segments, is received for each usability of program fragments Collect corresponding data set, using each usability of program fragments as a statistics stage.
5. superscale out-of order processor stable state according to claim 1 instructs throughput modeling method, feature to exist In, further include being pre-processed to the related micro-architecture independent parameter of each segment before the step S02, formation counterpiece The related micro-architecture independent parameter vector of section;By Data Dimensionality Reduction Algorithm to related micro-architecture independent parameter vector carry out dimensionality reduction, Denoising forms the micro-structure extraneous data collection of homologous segment.
6. superscale out-of order processor stable state according to claim 5 instructs throughput modeling method, feature to exist In, further include,
Data set comprising statistics stage all feature vectors is divided into multiple major class;
For each major class, major class is divided into a certain proportion of group using k- means clustering algorithms;
Choose in each group with a distance from central point at nearest o'clock as a feature vector.
CN201810229640.8A 2018-03-20 2018-03-20 Superscalar out-of-order processor steady state instruction throughput rate modeling method Active CN108519906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810229640.8A CN108519906B (en) 2018-03-20 2018-03-20 Superscalar out-of-order processor steady state instruction throughput rate modeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810229640.8A CN108519906B (en) 2018-03-20 2018-03-20 Superscalar out-of-order processor steady state instruction throughput rate modeling method

Publications (2)

Publication Number Publication Date
CN108519906A true CN108519906A (en) 2018-09-11
CN108519906B CN108519906B (en) 2022-03-22

Family

ID=63434021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810229640.8A Active CN108519906B (en) 2018-03-20 2018-03-20 Superscalar out-of-order processor steady state instruction throughput rate modeling method

Country Status (1)

Country Link
CN (1) CN108519906B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102652304A (en) * 2009-12-22 2012-08-29 国际商业机器公司 Predicting and avoiding operand-store-compare hazards in out-of-order microprocessors
CN103577159A (en) * 2012-08-07 2014-02-12 想象力科技有限公司 Multi-stage register renaming using dependency removal
CN105630458A (en) * 2015-12-29 2016-06-01 东南大学—无锡集成电路技术研究所 Prediction method of out-of-order processor steady-state average throughput rate based on artificial neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102652304A (en) * 2009-12-22 2012-08-29 国际商业机器公司 Predicting and avoiding operand-store-compare hazards in out-of-order microprocessors
CN103577159A (en) * 2012-08-07 2014-02-12 想象力科技有限公司 Multi-stage register renaming using dependency removal
CN105630458A (en) * 2015-12-29 2016-06-01 东南大学—无锡集成电路技术研究所 Prediction method of out-of-order processor steady-state average throughput rate based on artificial neural network

Also Published As

Publication number Publication date
CN108519906B (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN101399672B (en) Intrusion detection method for fusion of multiple neutral networks
CN107103332A (en) A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN106780121A (en) A kind of multiplexing electric abnormality recognition methods based on power load pattern analysis
CN110149237A (en) A kind of Hadoop platform calculate node load predicting method
CN102955902B (en) Method and system for evaluating reliability of radar simulation equipment
CN110047291A (en) A kind of Short-time Traffic Flow Forecasting Methods considering diffusion process
CN105630458B (en) The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network
Li et al. A hybrid model for river water level forecasting: cases of Xiangjiang River and Yuanjiang River, China
CN110569876A (en) Non-invasive load identification method and device and computing equipment
CN109359665A (en) A kind of family's electric load recognition methods and device based on support vector machines
CN115510042A (en) Power system load data filling method and device based on generation countermeasure network
CN110110915A (en) A kind of integrated prediction technique of the load based on CNN-SVR model
CN107909141A (en) A kind of data analysing method and device based on grey wolf optimization algorithm
CN102324007A (en) Method for detecting abnormality based on data mining
CN114720764A (en) Harmonic analysis method and system based on real-time monitoring data of electric meter
CN110941902A (en) Lightning stroke fault early warning method and system for power transmission line
CN112308341A (en) Power data processing method and device
CN115277354A (en) Fault detection method for command control network management system
CN112100910A (en) Power consumption model training method, power consumption testing method and device for processor
Tarsa et al. Workload prediction for adaptive power scaling using deep learning
CN109117352B (en) Server performance prediction method and device
CN113420506A (en) Method for establishing prediction model of tunneling speed, prediction method and device
CN109242142A (en) A kind of spatio-temporal segmentation parameter optimization method towards infrastructure networks
CN108519906A (en) Superscale out-of order processor stable state instructs throughput modeling method
CN110377525B (en) Parallel program performance prediction system based on runtime characteristics and machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant